Closed paleolimbot closed 1 year ago
So the problem is the iterator borrows the DataFrame
; instead you'll want to create an iterator that owns the DataFrame. (The Rust equivalent of shared_ptr is Arc, but it looks like Polars DataFrames are mutable so instead of using Arcs they are cheaply copy-able.)
So might need to define a struct like:
struct OwnedDataFrameIterator {
df: pl.DataFrame,
iter: polars::frame::RecordBatchIter
}
impl OwnedDataFrameIterator {
fn new(df: pl.DataFrame) -> Self {
Self { df, iter: df.iter_chunks() }
}
}
impl Iterator for OwnedDataFrameIterator {
type Item = Result<Box<dyn Array>, arrow::error::Error>;
fn next(&mut self) -> Self::Item {
self.iter.next()
}
}
I must admit I'm a complete rookie when it comes to the Arrow interface.
I think similar to @wjones127 suggestion, you could collect arrowArray into a vector which is owned and then pass it on. I have added collect(), a rechunk(), added a clone() and removed a 'move' . The method below will compile.
What would happen to the swapped stream pointer if the DataFrame memory is dropped on Rust side? Let's find out :) Otherwise we can export also a DataFrame clone to protect memory allocation.
pub fn export_stream(&mut self, stream_ptr: &str) {
let schema = self.0.schema().to_arrow();
let data_type = DataType::Struct(schema.fields);
let field = ArrowField::new("", data_type.clone(), false);
self.0.rechunk(); //avoids panic if series' are chunked, see iter_chunks() doc
let df = &self.0;
let chunk_vec: Vec<_> = df
.iter_chunks()
.map(
|item| -> Result<Box<dyn arrow::array::Array>, arrow::error::Error> {
let array = arrow::array::StructArray::new(
data_type.clone(),
item.into_arrays(),
std::option::Option::None,
);
Ok(Box::new(array))
},
)
.collect();
let chunk_vec_boxed = Box::new(chunk_vec.into_iter());
let mut stream = arrow::ffi::export_iterator(chunk_vec_boxed, field);
let stream_out_ptr_addr: usize = stream_ptr.parse().unwrap();
let stream_out_ptr = stream_out_ptr_addr as *mut arrow::ffi::ArrowArrayStream;
unsafe {
std::ptr::swap_nonoverlapping(
stream_out_ptr,
&mut stream as *mut arrow::ffi::ArrowArrayStream,
1,
);
}
}
Thanks to you both!
I'm giving Will's a shot first because I'd like to know for my own benefit how to do this kind of thing (and because I imagine it will translate more directly to exporting a reader from a lazy frame which is what I'm really excited about). I have
struct OwnedDataFrameIterator<'a> {
df: polars::frame::DataFrame,
iter: polars::frame::RecordBatchIter<'a>,
data_type: arrow::datatypes::DataType
}
impl OwnedDataFrameIterator<'_> {
fn new(df: polars::frame::DataFrame ) -> Self {
let schema = df.schema().to_arrow();
let data_type = DataType::Struct(schema.fields);
let iter = polars::frame::RecordBatchIter {
columns: df.get_columns(),
idx: 0,
n_chunks: df.n_chunks().unwrap(),
};
Self { df, iter, data_type }
}
}
impl Iterator for OwnedDataFrameIterator<'_> {
type Item = Result<Box<dyn arrow::array::Array>, arrow::error::Error>;
fn next(&mut self) -> Option<Self::Item> {
let item = self.iter.next();
match item {
std::option::Option::Some(i) => {
let array = arrow::array::StructArray::new(self.data_type.clone(), i.into_arrays(), std::option::Option::None);
Some(std::result::Result::Ok(Box::new(array)))
}
_ => None
}
}
}
...which almost compiles except for:
error[E0515]: cannot return value referencing function parameter `df`
--> src/rdataframe/mod.rs:40:9
|
35 | columns: df.get_columns(),
| ---------------- `df` is borrowed here
...
40 | Self { df, iter, data_type }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a value referencing data owned by the current function
|
= help: use `.collect()` to allocate the iterator
error[E0505]: cannot move out of `df` because it is borrowed
--> src/rdataframe/mod.rs:40:16
|
31 | fn new(df: polars::frame::DataFrame ) -> Self {
| ---- return type is OwnedDataFrameIterator<'1>
...
35 | columns: df.get_columns(),
| ---------------- borrow of `df` occurs here
...
40 | Self { df, iter, data_type }
| -------^^-------------------
| | |
| | move out of `df` occurs here
| returning this value requires that `df` is borrowed for `'1`
Some errors have detailed explanations: E0505, E0515.
For more information about an error, try `rustc --explain E0505`.
error: could not compile `rpolars` due to 2 previous errors
I have a feeling I'm missing a lifetime specifier somewhere but I don't know where to put it!
Okay I think I led you astray just a little. I forgot that you can't have self-referential structs in Rust. Basically references are just pointers, and Rust doesn't guarantee it won't move around your struct, invalidating the pointer.
So this means that you can't use polars::frame::RecordBatchIter
, but instead need to create a modified version of it's implementation (haven't tested, but roughly correct):
pub struct OwnedDataFrameIterator {
columns: Vec<Series>,
data_type: arrow::datatypes::DataType,
idx: usize,
n_chunks: usize,
}
impl OwnedDataFrameIterator {
fn new(df: polars::frame::DataFrame ) -> Self {
let schema = df.schema().to_arrow();
let data_type = DataType::Struct(schema.fields);
Self {
columns: df.get_columns().clone(),
data_type,
idx: 0,
n_chunks: df.n_chunks().unwrap()
}
}
}
impl Iterator for OwnedDataFrameIterator<'_> {
type Item = Result<Box<dyn arrow::array::Array>, arrow::error::Error>;
fn next(&mut self) -> Option<Self::Item> {
if self.idx >= self.n_chunks {
None
} else {
// create a batch of the columns with the same chunk no.
let batch_cols = self.columns.iter().map(|s| s.to_arrow(self.idx)).collect();
self.idx += 1;
let chunk = ArrowChunk::new(batch_cols));
let array = arrow::array::StructArray::new(self.data_type.clone(), chunk.into_arrays(), std::option::Option::None);
Some(std::result::Result::Ok(Box::new(array)))
}
}
}
Thanks Will! It compiles!!
Now I have:
> df = pl$DataFrame(iris)
> stream = nanoarrow::nanoarrow_allocate_array_stream()
> df$export_stream(nanoarrow::nanoarrow_pointer_addr_chr(stream))
Error: syntax error: export_stream is not a method/attribute of the class DataFrame
when calling:
df$export_stream
(I'm sure it's a really dumb error!)
Not at all :) Some context
In py-polars the DataFrame have four class levels (same for Series, Expr, LazyFrame, ...):
outer python public api class named DataFrame
r-polars have only three class levels:
rpolars used to have a fourth R6 class, but it lead to more boilerplate code and heavier objects, not just an external pointer.
Instead the external pointer has a private set of methods which are the extendr-wrappers and public set of methods derived from the private methods and pure R functions.
You can access the private functions via .pr
(pr for private) which is the root namespace of all private methods. Notice the private function are made into pure functions which take a DataFrame as argument.
df = pl$DataFrame(iris)
#print with private external method
.pr$DataFrame$print(df) # #here I use the private print method
#print with public internal method
df$print()
#imlementation of public method
> df$print
function() {
.pr$DataFrame$print(self) #self is S3/extendr-magic and refers to the lhs of $ thus the DataFrame externalpointer the method is called from.
invisible(self)
}
#print with S3 method
print(DataFrame)
I tried to doc the classes and API here: https://rpolars.github.io/reference/DataFrame_class.html https://rpolars.github.io/reference/index.html#rpolars-api-and-namespace https://rpolars.github.io/reference/pl.html https://rpolars.github.io/reference/dot-pr.html
You can immediately use .pr$DataFrame$export_stream(df)
for testing
You need to implement the function called exactly
DataFrame_export_stream = function() {
#some impl details
.pr$DataFrame$export_stream(self,...)
}
likely you would place it in R/dataframe__frame.R
Then rebuild and the public function is available
all these shenanigans is to get syntax which is as identical as possible to py-polars also the implementation code should look the same e.g. expr__expr.R is named liked that because its mirror file is found in py-polars/polars/internals/expr/expr.R. The most method implementations are in the same order and look very similar. The docs look similar if did not straight copy paste :)
Beauty!
library(rpolars)
df <- pl$DataFrame(nycflights13::flights)
bench::mark(
as.data.frame(df),
nanoarrow = {
stream <- df$export_stream()
nanoarrow::convert_array_stream(stream, size = df$shape[1])
}
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 as.data.frame(df) 46.1ms 47.6ms 21.0 38.7MB 84.0
#> 2 nanoarrow 22.5ms 24.6ms 40.8 38.7MB 18.6
Created on 2023-01-06 with reprex v2.0.2
A few more comparisons:
library(arrow, warn.conflicts = FALSE)
library(rpolars)
df <- pl$DataFrame(nycflights13::flights)
n <- df$shape[1]
bench::mark(
as.data.frame(df),
nanoarrow = {
stream <- df$export_stream()
nanoarrow::convert_array_stream(stream, size = n)
},
# much faster because strings are never materialized to R
arrow_table = {
stream <- df$export_stream()
reader <- arrow::as_record_batch_reader(stream)
arrow::as_arrow_table(reader)
},
# much faster because of ALTREP chunked arrays for strings
arrow_df = {
stream <- df$export_stream()
reader <- arrow::as_record_batch_reader(stream)
as.data.frame(arrow::as_arrow_table(reader))
},
# with materializing strings
arrow_df = {
stream <- df$export_stream()
reader <- arrow::as_record_batch_reader(stream)
as.data.frame(arrow::as_arrow_table(reader))[n:1, ]
},
check = FALSE
)
#> # A tibble: 5 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 as.data.frame(df) 46.8ms 47.1ms 21.1 38.7MB 31.6
#> 2 nanoarrow 23.9ms 24.8ms 39.6 38.81MB 29.7
#> 3 arrow_table 238.5µs 249.9µs 3903. 1.61MB 24.8
#> 4 arrow_df 480.8µs 494.1µs 1995. 635.78KB 24.4
#> 5 arrow_df 41.3ms 41.5ms 24.1 65.64MB 80.4
Created on 2023-01-06 with reprex v2.0.2
Nice! Looking forward to try it out on Monday.
I tried to check for mem safety by dropping df
after stream, but I fail to cause any errors. I guess that even though df
is dropped, the lower level Arrow-arrays are not.
library(nycflights13)
library(rpolars)
library(nanoarrow)
df <- pl$DataFrame(nycflights13::flights)
n = df$shape[1]
#make stream
stream = df$export_stream()
# dropping df and GC
rm(df)
gc()
# all good, does not break anything
df_na = nanoarrow::convert_array_stream(stream, size = n)
@paleolimbot do you have any idea why this happens? The reverse order, arrow first then rpolars is fine
Restarting R session...
* Project '~/Documents/projs/r-polars' loaded. [renv 0.16.0]
> library(rpolars)
> library(arrow)
Error: package or namespace load failed for ‘arrow’:
.onLoad failed in loadNamespace() for 'arrow', details:
call: NULL
error: syntax error: set_pointer is not a method/attribute of the class DataType
when calling:
library(arrow)
In addition: Warning message:
In arrow__UnregisterRExtensionType(extension_name) :
restarting interrupted promise evaluation
>
I see that too! My guess is that both arrow and rpolars implement [[
or $
for DataType
. I think this wouldn't be a problem if they were both R6 (which I think is where the offending method lives for arrow)...I imagine you will have to rename any intersecting classes to avoid that problem. (My first hunch was a symbol collision between the two .so files, but I ran nm -g
on both and couldn't find any in common).
(It is rather rude of us to define like 10 million R6 class names in Arrow...we should have prefixed them somehow but we didn't know and that ship sailed a long time ago...)
I redid this to use S3 methods (registered at runtime to avoid a hard dependency). I did some for arrow too, which means that you could pass these things directly into a bunch of Arrow functions and have it "just work" 🙂
library(arrow, warn.conflicts = FALSE)
library(nanoarrow)
library(rpolars)
df <- pl$DataFrame(nycflights13::flights)
as_nanoarrow_array_stream(df)
#> <nanoarrow_array_stream struct<year: int32, month: int32, day: int32, dep_time: int32, sched_dep_time: int32, dep_delay: double, arr_time: int32, sched_arr_time: int32, arr_delay: double, carrier: large_string, flight: int32, tailnum: large_string, origin: large_string, dest: large_string, air_time: double, distance: double, hour: double, minute: double, time_hour: double>>
#> $ get_schema:function ()
#> $ get_next :function (schema = x$get_schema(), validate = TRUE)
#> $ release :function ()
format(infer_nanoarrow_schema(df))
#> [1] "<nanoarrow_schema struct<year: int32, month: int32, day: int32, dep_time: int32, sched_dep_time: int32, dep_delay: double, arr_time: int32, sched_arr_time: int32, arr_delay: double, carrier: large_string, flight: int32, tailnum: large_string, origin: large_string, dest: large_string, air_time: double, distance: double, hour: double, minute: double, time_hour: double>>"
as_record_batch_reader(df)
#> RecordBatchReader
#> year: int32
#> month: int32
#> day: int32
#> dep_time: int32
#> sched_dep_time: int32
#> dep_delay: double
#> arr_time: int32
#> sched_arr_time: int32
#> arr_delay: double
#> carrier: large_string
#> flight: int32
#> tailnum: large_string
#> origin: large_string
#> dest: large_string
#> air_time: double
#> distance: double
#> hour: double
#> minute: double
#> time_hour: double
as_arrow_table(df)
#> Table
#> 336776 rows x 19 columns
#> $year <int32>
#> $month <int32>
#> $day <int32>
#> $dep_time <int32>
#> $sched_dep_time <int32>
#> $dep_delay <double>
#> $arr_time <int32>
#> $sched_arr_time <int32>
#> $arr_delay <double>
#> $carrier <large_string>
#> $flight <int32>
#> $tailnum <large_string>
#> $origin <large_string>
#> $dest <large_string>
#> $air_time <double>
#> $distance <double>
#> $hour <double>
#> $minute <double>
#> $time_hour <double>
Created on 2023-01-10 with reprex v2.0.2
I will try to build the PR on a windows machine and set what the fail is about
I think nanoarrow is failing to build on windows at the moment. It is the same error via github runner or if I try to install nanoarrow on my old gamer pc.
> remotes::install_github("apache/arrow-nanoarrow/r", build = FALSE)
Downloading GitHub repo apache/arrow-nanoarrow@HEAD
Installing package into 'C:/Users/soren/AppData/Local/R/cache/R/renv/library/r-polars-276af647/R-4.2/x86_64-w64-mingw32'
(as 'lib' is unspecified)
* installing *source* package 'nanoarrow' ...
** using staged installation
**********************************************
WARNING: this package has a configure script
It probably needs manual configuration
**********************************************
** libs
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c altrep.c -o altrep.o
In file included from altrep.c:26:
array.h:24:10: fatal error: nanoarrow.h: No such file or directory
24 | #include "nanoarrow.h"
| ^~~~~~~~~~~~~
compilation terminated.
make: *** [C:/R/R-42~1.2/etc/x64/Makeconf:253: altrep.o] Error 1
ERROR: compilation failed for package 'nanoarrow'
* removing 'C:/Users/soren/AppData/Local/R/cache/R/renv/library/r-polars-276af647/R-4.2/x86_64-w64-mingw32/nanoarrow'
Warning messages:
1: In untar2(tarfile, files, list, exdir, restore_times) :
skipping pax global extended headers
2: In untar2(tarfile, files, list, exdir, restore_times) :
skipping pax global extended headers
3: In i.p(...) :
installation of package 'C:/Users/soren/AppData/Local/Temp/Rtmp0uNYJK/remotes3088258725c6/apache-arrow-nanoarrow-848ffc5/r' had non-zero exit status
>
Oh yeah...I almost certainly need configure.win since ./configure doesn't run on windows 🤦
Oh yeah...I almost certainly need configure.win since ./configure doesn't run on windows 🤦
Would that be some like rewriting configure and place it in a Makevars.win
file?
Traditionally that kind of thing is baked into src/Makevars.win
instead of configure.win...in that repo the R package pulls nanoarrow.h from the parent directory to make sure everything is in sync; however, I've never tested install via remotes on Windows.
Ok...this should be fixed from nanoarrow's side (I tested remotes::install_github("apache/arrow-nanoarrow")
in VM and it worked).
very close I think remotes can reset permissions for configure but renv cannot. Maybe if you set permission for configure and configure.win to chmod 777 or something like that and it should work
PS C:\Users\soren\Documents\projs\r-polars> C:\R\R-4.2.2\bin\R.exe
R version 4.2.2 (2022-10-31 ucrt) -- "Innocent and Trusting"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
* Project '~/projs/r-polars' loaded. [renv 0.16.0]
* The project may be out of sync -- use `renv::status()` for more details.
[Previously saved workspace restored]
> renv::restore()
The following package(s) will be updated:
# CRAN ===============================
- arrow [* -> 10.0.1]
- assertthat [* -> 0.2.1]
- bit [* -> 4.0.5]
- bit64 [* -> 4.0.5]
# GitHub =============================
- nanoarrow [* -> apache/arrow-nanoarrow:r@HEAD]
Do you want to proceed? [y/N]: y
Retrieving 'https://api.github.com/repos/apache/arrow-nanoarrow/tarball/848ffc5d3f99dabbc3fdb225d42be6d47e7c5402' ...
OK [downloaded 179.6 Kb in 0.7 secs]
Installing assertthat [0.2.1] ...
OK [linked cache]
Installing bit [4.0.5] ...
OK [linked cache]
Installing bit64 [4.0.5] ...
OK [linked cache]
Installing arrow [10.0.1] ...
OK [linked cache]
Installing nanoarrow [0.0.0.9000] ...
FAILED
Error installing package 'nanoarrow':
=====================================
* installing *source* package 'nanoarrow' ...
** using staged installation
**********************************************
WARNING: this package has a configure script
It probably needs manual configuration
**********************************************
** libs
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c altrep.c -o altrep.o
In file included from altrep.c:26:
array.h:24:10: fatal error: nanoarrow.h: No such file or directory
24 | #include "nanoarrow.h"
| ^~~~~~~~~~~~~
compilation terminated.
make: *** [C:/R/R-42~1.2/etc/x64/Makeconf:253: altrep.o] Error 1
ERROR: compilation failed for package 'nanoarrow'
* removing 'C:/Users/soren/Documents/projs/r-polars/renv/staging/1/nanoarrow'
Error: install of package 'nanoarrow' failed [error code 1]
Traceback (most recent calls last):
12: renv::restore()
11: renv_restore_run_actions(project, diff, current, lockfile, rebuild)
10: renv_install_impl(records)
9: renv_install_staged(records)
8: renv_install_default(records)
7: handler(package, renv_install_package(record))
6: renv_install_package(record)
5: withCallingHandlers(renv_install_package_impl(record), error = function(e) {
vwritef("\tFAILED")
writef(e$output)
})
4: renv_install_package_impl(record)
3: r_cmd_install(package, path)
2: r_exec_error(package, output, "install", status)
1: stop(error)
> remotes::install_github("apache/arrow-nanoarrow/r")
Downloading GitHub repo apache/arrow-nanoarrow@HEAD
checking for file 'C:\Users\soren\AppData\Local\Temp\RtmpQNEHz7\remotesffc5a62625d\apache-arrow-nanoarrow-da7b5ec\r/D✔ checking for file 'C:\Users\soren\AppData\Local\Temp\RtmpQNEHz7\remotesffc5a62625d\apache-arrow-nanoarrow-da7b5ec\r/DESCRIPTION'
─ preparing 'nanoarrow':
✔ checking DESCRIPTION meta-information
─ cleaning src
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building 'nanoarrow_0.0.0.9000.tar.gz'
Warning: file 'nanoarrow/configure' did not have execute permissions: corrected
Installing package into 'C:/Users/soren/AppData/Local/R/cache/R/renv/library/r-polars-276af647/R-4.2/x86_64-w64-mingw32'(as 'lib' is unspecified)
* installing *source* package 'nanoarrow' ...
** using staged installation
Fetched bundled nanoarrow from https://github.com/apache/arrow-nanoarrow/tree/main/dist
** libs
Warning: this package has a non-empty 'configure.win' file,
so building only the main architecture
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c altrep.c -o altrep.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c array.c -o array.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c array_stream.c -o array_stream.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c array_view.c -o array_view.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c buffer.c -o buffer.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c convert.c -o convert.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c convert_array.c -o convert_array.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c convert_array_stream.c -o convert_array_stream.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c infer_ptype.c -o infer_ptype.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c init.c -o init.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c materialize.c -o materialize.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c nanoarrow.c -o nanoarrow.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c pointers.c -o pointers.o
g++ -std=gnu++11 -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c pointers_cpp.cc -o pointers_cpp.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c schema.c -o schema.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c util.c -o util.o
gcc -I"C:/R/R-42~1.2/include" -DNDEBUG -I"C:/rtools42/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -std=gnu99 -mfpmath=sse -msse2 -mstackrealign -c version.c -o version.o
g++ -std=gnu++11 -shared -s -static-libgcc -o nanoarrow.dll tmp.def altrep.o array.o array_stream.o array_view.o buffer.o convert.o convert_array.o convert_array_stream.o infer_ptype.o init.o materialize.o nanoarrow.o pointers.o pointers_cpp.o schema.o util.o version.o -LC:/rtools42/x86_64-w64-mingw32.static.posix/lib/x64 -LC:/rtools42/x86_64-w64-mingw32.static.posix/lib -LC:/R/R-42~1.2/bin/x64 -lR
installing to C:/Users/soren/AppData/Local/R/cache/R/renv/library/r-polars-276af647/R-4.2/x86_64-w64-mingw32/00LOCK-nanoarrow/00new/nanoarrow/libs/x64
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (nanoarrow)
Warning messages:
1: In untar2(tarfile, files, list, exdir, restore_times) :
skipping pax global extended headers
2: In untar2(tarfile, files, list, exdir, restore_times) :
skipping pax global extended headers
>
I'm not sure I can help with the renv issue...once nanoarrow is on CRAN that problem will go away (and I've already filed my thoughts on the use of renv here 🙂 ).
Warning: file 'nanoarrow/configure' did not have execute permissions: corrected
I think that will come back and haunt you later anyhow. Not sure the CRAN win builder and/or R CMD check will allow that. Also anyone perhaps someone will build nanoarrow
without remotes, and file an issue. I know file permissions are annoying, it is just something to be aware of working cross-platform projects.
I believe the file permissions are correct; however, Windows git might not respect or know how to deal with attributes?
I can also try a Makevars-based solution tomorow!
ok you're right, you did set the permission wrong. It think it might be this issue https://github.com/r-lib/devtools/issues/1799
cloning nanoarow and using renv::install("./r") works fine
PS C:\Users\soren\Documents\projs> git clone git@github.com:apache/arrow-nanoarrow.git nano2
Cloning into 'nano2'...
remote: Enumerating objects: 1527, done.
remote: Counting objects: 100% (395/395), done.
remote: Compressing objects: 100% (203/203), done.
remote: Total 1527 (delta 226), reused 320 (delta 187), pack-reused 1132
Receiving objects: 100% (1527/1527), 3.44 MiB | 2.84 MiB/s, done.
Resolving deltas: 100% (912/912), done.
PS C:\Users\soren\Documents\projs> cd nano2
PS C:\Users\soren\Documents\projs\nano2> C:\R\R-4.2.2\bin\R.exe
> renv::install("./r")
Installing nanoarrow [0.0.0.9000] ...
OK [built from source]
not sure what changed, when checking out the pr on a windows machine it builds just fine whereas the ubuntu and mac fails tests here :/
now everything worked ¯(°o)/¯. ¯_(ツ)/¯
all looks fine, I have merged in latest rust-polars version if all passed again I will merge
Building on the excellent experiments of @sorhawell in https://github.com/rpolars/rpolars/issues/4 and https://github.com/rpolars/rpolars/compare/main...nanoarrow , this is an attempt to export data frames to the Arrow C Stream interface.
This doesn't compile yet, of course, but hopefully somebody who actually does Rust can help here! The error really does hit at the crux of the matter, which is that the
DataFrame
has to outlive the stream. In C++ one could do something like this using shared pointers and a virtual deleter...I'm lost as to how this should be done here but am totally willing to learn!