hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.46k stars 310 forks source link

how to convert pandas dataframe to hosseinmoein dataframe. #170

Closed xkungfu closed 2 years ago

xkungfu commented 2 years ago

I have worked out this , but don't know if it is good way. If it is useful, hope it help someone.

the codes:

python side:

temp_str = all_df.to_csv(); then store the temp_str to a file or to redis.

c++ side:


inline std::string ReplaceAll(std::string str, const std::string& from, const std::string& to)
{
    size_t start_pos = 0;
    while ((start_pos = str.find(from, start_pos)) != std::string::npos) {
        str.replace(start_pos, from.length(), to);
        start_pos += to.length(); // Handles case where 'to' is a substring of 'from'
    }
    return str;
}

std::string str_from_python = A STRING FROM PYTHON SIDE, FROM A FILE OR FROM REDIS ETC.;

//prepare to replace the first line of the python side string:
std::string first_line_base = "INDEX:14:<ulong>,id:14:<double>,name:14:<string>";

//count the lines of the string:
int count = 0;
std::string b = "\n";
std::string::size_type i = str_from_python.find(b);
while (i != std::string::npos)
{
    ++count;
    i = str_from_python.find(b, i+b.length());
}

std::string first_line_new = ReplaceAll(first_line_base, "14", std::to_string(count));

//remove the first line of the string
str_from_python.erase(0, str_from_python.find("\n") + 1);

//add new line to the string:
std::string str_from_python_new = first_line_new + "\n" + str_from_python;

MYDataFrame df = from_string(str_from_python_new)
hosseinmoein commented 2 years ago

Yes, as you figured out the only problem is that the header of DataFrame in csv2 format is different than Pandas header.

DataFrame in its header has two more items; the type of the column and number of rows in the column. For number of rows, you don't have to be accurate. It is an estimate. You can even put 0 and it would work. Of course to be most efficient in space allocation and speed you must be accurate.

xkungfu commented 2 years ago

thanks for explanation! that is very good!