hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.38k stars 298 forks source link

Convert C++ Dataframe to Pandas #310

Closed Aratiganesh123 closed 6 days ago

Aratiganesh123 commented 2 weeks ago

I am working on a project to speed up data loading and preprocessing using a C++ Dataframe library. My goal is to preprocess the data in C++ and then use the processed data to train models in scikit-learn and PyTorch. I am considering using Pybind11 to integrate C++ and Python.

My main concern is the overhead of converting the C++ Dataframe to a Pandas DataFrame in Python. I want to ensure that this conversion process is efficient and does not introduce significant overhead. Could you provide guidance or suggestions on how to achieve this with minimal overhead?

hosseinmoein commented 2 weeks ago

The only way to do that is to write the C++ DataFrame into a file in csv2 format which is the same format that Pandas can also read. Alternatively, you can do the whole thing in C++ and skip all the language conversions.