Closed meztez closed 8 months ago
Before you can submit to CRAN you will have to request them to add the prerequisites on their Debian servers, otherwise you can't pass the incoming checks.
I think the best way is to send an email to CRAN@R-project.org
requesting to add the libgrpc++-dev
and protobuf-compiler-grpc
packages to their server. The other deps are already there, see this list.
Sent
I suspect the we'll need to fix this as well: https://github.com/meztez/bigrquerystorage/actions/runs/7168946042/job/19518216883#step:8:125
Files which contain pragma(s) suppressing diagnostics:
'src/google/api/annotations.pb.h' 'src/google/api/client.pb.h'
'src/google/api/field_behavior.pb.h' 'src/google/api/http.pb.h'
'src/google/api/launch_stage.pb.h' 'src/google/api/resource.pb.h'
'src/google/cloud/bigquery/storage/v1/arrow.pb.h'
'src/google/cloud/bigquery/storage/v1/avro.pb.h'
'src/google/cloud/bigquery/storage/v1/protobuf.pb.h'
'src/google/cloud/bigquery/storage/v1/storage.pb.h'
'src/google/cloud/bigquery/storage/v1/stream.pb.h'
'src/google/cloud/bigquery/storage/v1/table.pb.h'
'src/google/rpc/status.pb.h'
E.g. by porting the fix from the old configure.R
before Jeroen's PR.
I didn't see these warnings on winbuilder so they may not be checked on Windows: https://win-builder.r-project.org/0GJ2L4z4Zm9u/
CRAN has added libgrpc++-dev and protobuf-compiler-grpc: https://github.com/r-devel/rcheckserver/commit/380349aa02c76a0c3565aaf4f696e74413eeabea
I think your package is ready to be submitted.
Note that it may still take a while, the first submission will be scheduled for manual review by a cran assistent which takes a while and sometimes requires a few rounds of resubmitting to some requests.
@jeroen @gaborcsardi
Got this answer from CRAN. I'll look into it after the holidays.
Prof Brian Ripley ripley@stats.ox.ac.uk 06 h 10 (il y a 2 heures) à Bruno, CRAN, Tomas
This downloads pre-compiled software on macOS, contrary to the CRAN policy (and Dr Ligges confirms permission was not given).
The README says
'Windows
The package will automatically download a static build of the system requirements during installation.'
although that is not clear from the check log. That too would be a policy violation.
The package has been removed from CRAN.
Ugh, I guess you have to ask them for permission to download libs (grpc is not available on their mac/win servers)
@jeroen Oh, well I guess I'll ask for permission then. Thanks for the input, it gives me confidence to follow up with CRAN. I don't know if I would have taken that step by myself.
Well it is not certain they will agree. Uwe Ligges is usually helpful and but Brian Ripley can be very unreasonable. So the best path is probably to ask Uwe and explain him the grpc library is not available on their Mac or Windows servers.
the grpc library is available on their Mac or Windows servers.
not available, you mean?
Btw. I just opened https://bugs.r-project.org/show_bug.cgi?id=18641
Yes sorry typo 🤦
By the way, in the mean while users can install binary packages from: https://meztez.r-universe.dev/bigrquerystorage
I sent this:
Hi Dr Ligges,
In your opinion, what would be the best way forward to respect CRAN policy and get the package published?
Happy holidays to you and the people in your life.
Bruno
Context :
Some members of the R community have communicated to me their wish to see package
bigrquerystorage
(https://github.com/meztez/bigrquerystorage) published to CRAN. Github users -jeroen, -gaborcsardi and -hadley have helped me with compatibility on all platforms. (https://github.com/r-dbi/bigrquery/pull/425, https://github.com/r-dbi/bigrquery/issues/381)It is a companion package to
bigrquery
that uses theBigQuery Storage
API based on their gRPC protocol instead of their regular REST API. It provides better performance for larger datasets.Unfortunately, it has a dependency (gRPC) that is not yet available on Windows and Mac CRAN servers, as Jeroen tells me. To remedy that, I understand the configure script was modified to install precompiled binaries for Mac and Windows. I was advised to ask for explicit permission after receiving Dr Ripley's communication.
On Windows https://github.com/meztez/bigrquerystorage/blob/main/tools/winlibs.R
url <- if(grepl("aarch", R.version$platform)){ "https://github.com/r-windows/bundles/releases/download/grpc-1.51.1/grpc-1.51.1-clang-aarch64.tar.xz" } else if(grepl("clang", Sys.getenv('R_COMPILED_BY'))){ "https://github.com/r-windows/bundles/releases/download/grpc-1.51.1/grpc-1.51.1-clang-x86_64.tar.xz" } else if(getRversion() >= "4.3") { "https://github.com/r-windows/bundles/releases/download/grpc-1.51.1/grpc-1.51.1-ucrt-x86_64.tar.xz" } else { "https://github.com/rwinlib/grpc/archive/refs/tags/1.33.2-2.tar.gz" } download.file(url, basename(url), quiet = TRUE)
@meztez https://bugs.r-project.org/show_bug.cgi?id=18641 has been fixed and Rtools43 has grpc now. So we could try to give this another go, if you can gather the strength for another submission.
We would need to update the build process on Windows slightly, and then re-submit. I can submit a PR for the changes.
Also, if you don't have strength to do it, I completely understand. In this case I can do the first submission, and then once it is safely on CRAN, pass the package to you in the next submission.
Note that arrow
may be removed from CRAN soon (apache/arrow#39806).
Perhaps you would prefer to make your submission after the arrow
issue is resolved.......
It might also be an interesting improvement to remove arrow
from the required dependencies and replace it with nanoarrow
.
@eitsupi
Note that
arrow
may be removed from CRAN soon (apache/arrow#39806). Perhaps you would prefer to make your submission after thearrow
issue is resolved.......It might also be an interesting improvement to remove
arrow
from the required dependencies and replace it withnanoarrow
.
The arrow dependencies that would have to be replaced are: https://github.com/meztez/bigrquerystorage/blob/d2f5c0dd8067560a9e80dd110cc9af2d0b5aec16/R/bqs_download.R#L112, https://github.com/meztez/bigrquerystorage/blob/d2f5c0dd8067560a9e80dd110cc9af2d0b5aec16/R/bqs_download.R#L123, https://github.com/meztez/bigrquerystorage/blob/d2f5c0dd8067560a9e80dd110cc9af2d0b5aec16/R/bqs_download.R#L116
I've taken a look at nanoarrow and could not find drop in replacement at first glance. Would you have further suggestion on how this could be done?
What would be the benefits of using nanoarrow in this instance?
Thank you
What would be the benefits of using nanoarrow in this instance?
The biggest advantage is that there is no need to have the huge and time-consuming to build arrow
package as a dependency. The R ADBC-related packages and the Arrow-related methods of the DBI package are all based on nanoarrow.
https://arrow.apache.org/adbc/current/r/index.html
https://dbi.r-dbi.org/articles/DBI-arrow.html
I've taken a look at nanoarrow and could not find drop in replacement at first glance. Would you have further suggestion on how this could be done?
The correspondence between the objects of the arrow
package and those of the nanoarrow
package may be understood from this file.
https://github.com/apache/arrow-nanoarrow/blob/a41fb2be59f44b6bd6bf5a014aef253041fdc10f/r/R/pkg-arrow.R
Since the Table
equivalent object exists only in Apache Arrow's C++ implementation, I believe nanoarrow uses the struct type nanoarrow_array_stream
to exchange data in Table format.
@eitsupi I've tried to work out how to use a two elements raws list (schema, data) and turn into a data.frame using nanoarrow. I cannot figure it out at the moment, I'm sorry.
> dt <- as_nanoarrow_array(raws[[2]], schema = infer_nanoarrow_schema(raws[[1]])) |> as.data.frame()
Error in as.data.frame.nanoarrow_array(as_nanoarrow_array(raws[[2]], schema = infer_nanoarrow_schema(raws[[1]]))) :
Can't convert array with type uint8 to data.frame()
>
@paleolimbot may be able to help here
I'm guessing here that raws[[1]]
is the serialized record batch and raws[[2]]
is the serialized schema? nanoarrow has an IPC reader but it hasn't been wired up in the R package yet. Until it has, I'm not sure it can help you here!
@paleolimbot What exactly do you mean by "IPC reader"? Does that basically mean reading the data from a socket?
Maybe we could do something simple, like writing all the data out to disk, and then reading it from there?
Could we push directly to a nanoarrow object instead of doing this C++ vector push to R raws to arrow table?
I'm thinking pushing directly into a nanoarrow object could save some time?
Do we need arrow to convert back into a R data.frame?
Until arrow is back on CRAN, I think we will have to hold. https://win-builder.r-project.org/incoming_pretest/bigrquerystorage_1.0.0_20240217_202902/Debian/00check.log https://github.com/apache/arrow/issues/39806
Sorry I missed the ping!
What exactly do you mean by "IPC reader"?
I believe that "IPC" (which can be used for inter-process communication, like via a socket, but works anywhere you might want to stream a table from a server to a client) is the format that you are currently using the arrow package to read (because when I browsed the code I saw a RecordBatchStreamReader
). In order to use nanoarrow to replace that bit of code, I (or somebody) would need to add a binding for nanoarrow (C)'s "ArrowIpcArrayStreamReader". The Python version of that is here: https://github.com/apache/arrow-nanoarrow/pull/388 .
Could we push directly to a nanoarrow object
It seems like your input here is a fully downloaded raw vector of the entire stream. When the IPC reader has R bindings, that would give you a nanoarrow_array_stream
, on which you could call as.data.frame()
. Technically you wouldn't have to materialize the bytes in advance (you could stream the input).
Until arrow is back on CRAN, I think we will have to hold.
I also had trouble submitting a small nanoarrow patch because it had arrow
in Suggests 🙁
I gave the implementation in R a go, which should make it into the next release in the next two months or so ( https://github.com/apache/arrow-nanoarrow/pull/390 ). You should be able to nanoarrow::read_nanoarrow(bqs_ipc_stream(...)) |> as.data.frame()
(no pressure).
In the meanwhile, it seems like arrow is back on CRAN [1], so we can also submit the current version, and then later switch to nanoarrow.
[1] https://fosstodon.org/deck/@nic_crane@mastodon.social/111981942640349340
In the meanwhile, it seems like arrow is back on CRAN [1], so we can also submit the current version, and then later switch to nanoarrow.
[1] https://fosstodon.org/deck/@nic_crane@mastodon.social/111981942640349340
Let's hope third time is the charm.
Thanks, on its way to CRAN.
Best, Benjamin Altmann
At some point in the future, not now, now is not the time.