Open DavisVaughan opened 1 week ago
I'd be happy to contribute to efforts in this vein if there is endorsement from R-core.
On Thu, Jul 4, 2024 at 12:22 PM Davis Vaughan @.***> wrote:
I've been working on an ALTREP "view" class, and have discovered a number of small places where I think base R itself could be a little more ALTREP friendly. I'd be interested in working on a few of these next week if there is support for it, I think it could be a really tractable win for r-dev-day with small and well scoped issues to tackle!
In particular, there are a few places I've found so far where R uses, say, LOGICAL() where it could instead use LOGICAL_RO(). These are cases where it seems like R only needs a read only pointer into the data. This would be more friendly to ALTREP classes that can provide a readonly pointer easily, but would have to materialize to provide a writable pointer (like an ALTREP "view").
- duplicate1(), used by Rf_duplicate()
- copyVector()
- R_compute_identical(), used by the R level identical()
All 3 of these currently force a materialization of my ALTREP class due to requesting a writable pointer, when I think they only need a readonly one.
In addition to these, I think that ExtractSubset() (used by the R level [, and many other places) could be modified to have really nice efficiency gains for ALTREP types. It currently calls the corresponding _ELT() method on each index*. I was thinking that alternatively it could try to see if a call to Dataptr_or_null() returns a cheap read only dataptr that it could use to extract from instead. I think that would make subset extraction way more efficient for ALTREP classes that can provide a readonly pointer easily (again, like a view).
I do realize there is an ALTREP Extract_subset method that I could implement for this last case, but subsetting is tricky to get exactly right for all cases, and I'd much rather just hand R a read only dataptr and let it handle the finer details of the subsetting!
Link: https://github.com/wch/r-source/blob/b9c83de5f7c9bb0b6dbe96b6d61a04d195b0cf2e/src/main/subset.c#L137
— Reply to this email directly, view it on GitHub https://github.com/r-devel/r-dev-day/issues/23, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG53MJYNPCX6VEIGIABATLZKWOHBAVCNFSM6AAAAABKMAU536VHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TCMZXGMYTANA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Seems worth a shot.
It does seem like we will need to defer to the extractsubset method if it's there and I dont recall if that's one of thr methods that can return null if it wants to do nothing and defer to core code (I'm on my phone at a train station atm)
On Sat, Jul 6, 2024, 4:59 PM ltierney @.***> wrote:
Seems worth a shot.
— Reply to this email directly, view it on GitHub https://github.com/r-devel/r-dev-day/issues/23#issuecomment-2211790155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG53MNPDFUAML5QINNPYU3ZLAA3NAVCNFSM6AAAAABKMAU536VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJRG44TAMJVGU . You are receiving this because you commented.Message ID: @.***>
rlang side of this, which will be used as a test to make sure it is working https://github.com/r-lib/rlang/pull/1725
Work with @Lime91, @krlmlr, and @gmbecker
I've been working on an ALTREP "view" class, and have discovered a number of small places where I think base R itself could be a little more ALTREP friendly. I'd be interested in working on a few of these next week if there is support for it, I think it could be a really tractable win for r-dev-day with small and well scoped issues to tackle!
In particular, there are a few places I've found so far where R uses, say,
LOGICAL()
where it could instead useLOGICAL_RO()
. These are cases where it seems like R only needs a read only pointer into the data. This would be more friendly to ALTREP classes that can provide a readonly pointer easily, but would have to materialize to provide a writable pointer (like an ALTREP "view").duplicate1()
, used byRf_duplicate()
copyVector()
R_compute_identical()
, used by the R levelidentical()
All 3 of these currently force a materialization of my ALTREP class due to requesting a writable pointer, when I think they only need a readonly one.
In addition to these, I think that
ExtractSubset()
(used by the R level[
, and many other places) could be modified to have really nice efficiency gains for ALTREP types. It currently calls the corresponding*_ELT()
method on each index. I was thinking that alternatively it could try to see if a call toDataptr_or_null()
returns a cheap read only dataptr that it could use to extract from instead. I think that would make subset extraction way more efficient for ALTREP classes that can provide a readonly pointer easily (again, like a view).I do realize there is an ALTREP
Extract_subset
method that I could implement for this last case, but subsetting is tricky to get exactly right for all cases, and I'd much rather just hand R a read only dataptr and let it handle the finer details of the subsetting!Link: https://github.com/wch/r-source/blob/b9c83de5f7c9bb0b6dbe96b6d61a04d195b0cf2e/src/main/subset.c#L137