pola-rs / r-polars

Polars R binding
https://pola-rs.github.io/r-polars/
Other
470 stars 36 forks source link

Add the `include_file_paths` parameter for all `read_...` functions? #1235

Closed collioud closed 2 weeks ago

collioud commented 3 weeks ago

Hello,

First of all, thank you for providing such great package!

I am wondering why the include_file_paths parameter is available for some of the read_... functions (for example, read_parquet) and not for all (i.e. read_csv).

Is there any technical reason? If not, would it be possible to make this parameter available for all of them?

Best regards

etiennebacher commented 3 weeks ago

Hi, indeed it looks like Python's pl.scan_csv() has this argument but our pl$scan_csv() doesn't. This should be added (and from there it can be added to pl$read_csv()).

This would be a good first issue where one needs to modify slightly a Rust function and the corresponding R functions. Would you like to contribute to the package? I'd be happy to review a PR, even incomplete.

collioud commented 3 weeks ago

Sure, I can give a try. I have (some) knowledge in R but none in Rust, so be prepared ;)

etiennebacher commented 3 weeks ago

Great! This package is quite big and more complex than the average R package so it can be a bit overwhelming, but you get used to it. Basically here's what you should do:

  1. Ensure you have the system requirements (Rust, Task, etc.) detailed here: https://github.com/pola-rs/r-polars/blob/main/DEVELOPMENT.md
  2. In Rust, add the argument include_file_paths and .with_include_file_paths(robj_to!(Option, String, include_file_paths)?.map(|x| x.into())); in the function new_from_csv() located here: https://github.com/pola-rs/r-polars/blob/main/src/rust/src/rdataframe/read_csv.rs
  3. Try to compile with task build-rust in the terminal. Note that compiling can take some time (the first one takes around 10-15min for me, and the subsequent ones take around 1min).
  4. Once compilation succeeds, you can add this argument in the R functions pl_scan_csv() and pl_read_csv().

Let's first try to successfully complete those steps and then we'll do the docs and tests.

Let me know if you need help for any of those steps and feel free to open a PR to show what you have done so far :)

collioud commented 3 weeks ago

Thank you for your help! Sending a PR right now... (I did what I could)