r-devel / r-dev-day

Repo to organize tasks for R Dev Days
4 stars 0 forks source link

Making R internals more ALTREP friendly #23

Open DavisVaughan opened 1 week ago

DavisVaughan commented 1 week ago

I've been working on an ALTREP "view" class, and have discovered a number of small places where I think base R itself could be a little more ALTREP friendly. I'd be interested in working on a few of these next week if there is support for it, I think it could be a really tractable win for r-dev-day with small and well scoped issues to tackle!


In particular, there are a few places I've found so far where R uses, say, LOGICAL() where it could instead use LOGICAL_RO(). These are cases where it seems like R only needs a read only pointer into the data. This would be more friendly to ALTREP classes that can provide a readonly pointer easily, but would have to materialize to provide a writable pointer (like an ALTREP "view").

All 3 of these currently force a materialization of my ALTREP class due to requesting a writable pointer, when I think they only need a readonly one.


In addition to these, I think that ExtractSubset() (used by the R level [, and many other places) could be modified to have really nice efficiency gains for ALTREP types. It currently calls the corresponding *_ELT() method on each index. I was thinking that alternatively it could try to see if a call to Dataptr_or_null() returns a cheap read only dataptr that it could use to extract from instead. I think that would make subset extraction way more efficient for ALTREP classes that can provide a readonly pointer easily (again, like a view).

I do realize there is an ALTREP Extract_subset method that I could implement for this last case, but subsetting is tricky to get exactly right for all cases, and I'd much rather just hand R a read only dataptr and let it handle the finer details of the subsetting!

Link: https://github.com/wch/r-source/blob/b9c83de5f7c9bb0b6dbe96b6d61a04d195b0cf2e/src/main/subset.c#L137

gmbecker commented 1 week ago

I'd be happy to contribute to efforts in this vein if there is endorsement from R-core.

On Thu, Jul 4, 2024 at 12:22 PM Davis Vaughan @.***> wrote:

I've been working on an ALTREP "view" class, and have discovered a number of small places where I think base R itself could be a little more ALTREP friendly. I'd be interested in working on a few of these next week if there is support for it, I think it could be a really tractable win for r-dev-day with small and well scoped issues to tackle!

In particular, there are a few places I've found so far where R uses, say, LOGICAL() where it could instead use LOGICAL_RO(). These are cases where it seems like R only needs a read only pointer into the data. This would be more friendly to ALTREP classes that can provide a readonly pointer easily, but would have to materialize to provide a writable pointer (like an ALTREP "view").

All 3 of these currently force a materialization of my ALTREP class due to requesting a writable pointer, when I think they only need a readonly one.

In addition to these, I think that ExtractSubset() (used by the R level [, and many other places) could be modified to have really nice efficiency gains for ALTREP types. It currently calls the corresponding _ELT() method on each index*. I was thinking that alternatively it could try to see if a call to Dataptr_or_null() returns a cheap read only dataptr that it could use to extract from instead. I think that would make subset extraction way more efficient for ALTREP classes that can provide a readonly pointer easily (again, like a view).

I do realize there is an ALTREP Extract_subset method that I could implement for this last case, but subsetting is tricky to get exactly right for all cases, and I'd much rather just hand R a read only dataptr and let it handle the finer details of the subsetting!

Link: https://github.com/wch/r-source/blob/b9c83de5f7c9bb0b6dbe96b6d61a04d195b0cf2e/src/main/subset.c#L137

— Reply to this email directly, view it on GitHub https://github.com/r-devel/r-dev-day/issues/23, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG53MJYNPCX6VEIGIABATLZKWOHBAVCNFSM6AAAAABKMAU536VHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4TCMZXGMYTANA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ltierney commented 1 week ago

Seems worth a shot.

gmbecker commented 1 week ago

It does seem like we will need to defer to the extractsubset method if it's there and I dont recall if that's one of thr methods that can return null if it wants to do nothing and defer to core code (I'm on my phone at a train station atm)

On Sat, Jul 6, 2024, 4:59 PM ltierney @.***> wrote:

Seems worth a shot.

— Reply to this email directly, view it on GitHub https://github.com/r-devel/r-dev-day/issues/23#issuecomment-2211790155, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG53MNPDFUAML5QINNPYU3ZLAA3NAVCNFSM6AAAAABKMAU536VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJRG44TAMJVGU . You are receiving this because you commented.Message ID: @.***>

DavisVaughan commented 2 days ago

rlang side of this, which will be used as a test to make sure it is working https://github.com/r-lib/rlang/pull/1725

DavisVaughan commented 2 days ago

Work with @Lime91, @krlmlr, and @gmbecker