Open takeyama0 opened 1 year ago
@takeyama0 Thanks for your suggestion and implementation idea! I'm positive with supporting polars for its good performance as you suggest.
IMO, I would like to move pandas
and polars
on python extras and raise import error when the users use pandas/polars features without import it.
It is because I think there's no application using both pandas
and polars
.
@Hi-king @ujiuji1259 @mski-iksm How do you think about this?
@takeyama0 Thanks for your suggestion! I think it’s great to support Polars too.
And I basically agree with @hirosassa ’s idea to minimize dependencies, but I’m a little bit concerned about moving pandas
on extras because some common methods (like TaskOnKart.load_data_frame) already use pandas.
@hirosassa , @ujiuji1259 Thank you for your replaying! I am glad to hear your positive feedback about supporting polars.
Hello, thank you for developing really cool tool!
Summary
I have one feature request to use Polars for loading and dumping data: Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as the memory model. If this library would support it, it would speed up the machine learning cycle even more.
Implementation idea
I have tried a very simple implementation for parquet files here. The changes are as follows.
use_polars = """ : boolean Whether to use polars instead of pandas """
cf.register_option( "use_polars", False, use_polars, )
Modify ParquetFileProcessor Class in gokart/file_processor.py to load and dump data by Polars when "use_polars" option is True.
I am not very familiar with the best practices regarding such a option, but if you comment on what needs to be fixed, I can work on it and make a pull request.