hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.54k stars 313 forks source link

How to retain nan value in csv2 format file when use read() ? #212

Closed zhouhaoan closed 1 year ago

zhouhaoan commented 2 years ago

Hello, I need to process some csv2 data with nan values, nan was saved as ''. According to examples, I try to use read() functions, I can successfully read the table and access to colunms by get_colunm() method. However, I found the length of columns are different and those nan values are filtered out when reading the file, the row indexes is also a mixture. I check the help page of read() function, it seems there doesn't have option to control whether retain nans or not. So what can I do if I want to retain nan value in original file and keep the row index not change?

Thanks!

hosseinmoein commented 2 years ago

Can you post a sample of your data file here? I will take a look

Currently, read() expects nan values to be nan -- ignoring the case -- in the file

zhouhaoan commented 2 years ago

The data is formatted like this, with '' to represent nan value. image

I have trans the headers into csv2 format, like: INDEX:775:<float>,oi:775:<float>,volume:775:<float>,last_price:775:<float>,turnover:775:<float>,ap1:775:<float>,ap2:775:<float>,ap3:775:<float>

hosseinmoein commented 2 years ago

Yeah, currently this will be undefined behavior in DataFrame. It actually shifts the data in the column which will mess up your data alignment.

Can you change ,, to ,nan,?

hosseinmoein commented 2 years ago

I would like to keep it like that, because it is tidier logic. It is less bug prone

zhouhaoan commented 1 year ago

OK,I will try to modify my data, Thanks for answering.