keller-mark / pizzarr

Slice into Zarr arrays in R 🍕
https://keller-mark.github.io/pizzarr/
MIT License
25 stars 2 forks source link

Read/write in parallel #91

Closed keller-mark closed 1 week ago

keller-mark commented 2 weeks ago

Unfortunately, the switch from crul::HttpClient to crul::HttpRequest + crul::AsyncVaried breaks the VCR request mocking due to https://github.com/ropensci/vcr/issues/246

Fixes #83 Related to #31

To get tests passing again, depends on https://github.com/ropensci/crul/pull/180

dblodgett-usgs commented 1 week ago

Looks like a great step forward.

Big question is whether foreach is the right parallelization option? I've gotten a lot of mileage out of pbapply, which offers an out of the box progress bar for *apply functions. Looks like you are just doubling down on doParallel? Looks like they haven't had any development in two years and he last change was a package maintainer update.

The other thing is what this ends up looking like from a user-facing API and UX point of view. It would be nice to havea vignette showing what the expected use pattern is. Once this is merged, maybe I can work that up as a test of what you did here?

keller-mark commented 1 week ago

Big question is whether foreach is the right parallelization option? I've gotten a lot of mileage out of pbapply

I am not tied to foreach and doParallel - i will try pbapply instead, thank you for the pointer

The other thing is what this ends up looking like from a user-facing API and UX point of view. It would be nice to havea vignette showing what the expected use pattern is. Once this is merged, maybe I can work that up as a test of what you did here?

Yes definitely we will need documentation of the usage. The tests are currently the only place that shows this https://github.com/keller-mark/pizzarr/blob/b978a5c7ad8d6192b7740a807a96797857cdf65d/tests/testthat/test-parallel.R#L50

dblodgett-usgs commented 1 week ago

Cool -- I'm not attached to any one parallelization scheme but it does seem that future is the one with the most community support. http://www.futureverse.org/ It looks like that community prefers https://progressr.futureverse.org/ which takes a different approach from pbapply, it requires a more tightly coupled implementation than pbapply. I just kind of fell into pbapply and have liked it because I can just use my typical pattern where I map to a list of jobs then apply to a result list that I then recombine and I don't have to do anything else to get a progress bar. If I had to switch to a different apply function it would be a totally trivial change.

e.g. https://github.com/DOI-USGS/hydroloom/blob/38ebe341ed9e25071adba2af0d6c4c3999caafd0/R/make_attribute_topology.R#L67