hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.38k stars 298 forks source link

Question: Does this library support streaming data frames? #288

Closed dhombios closed 4 months ago

dhombios commented 4 months ago

Sometimes the amount of data that needs to be processed is bigger than the amount of ram available. For that cases, Polars has a (still in development) streaming mode, that reads, processes and saves small chunks of data.

Is it possible to achieve something similar with this library?

hosseinmoein commented 4 months ago

If you look at the read() documentation, you see you can read large files in chunks. In the sample code snippts look at test_reading_in_chunks().

dhombios commented 4 months ago

Thanks for your answer. That was what I was looking for

If the number of rows in the actual file is smaller that the amount of rows asked to the read function, does read still load the dataset or does it raise an error?

hosseinmoein commented 4 months ago

It reads until num_rows is read or end of file. No errors will be thrown.