hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.41k stars 306 forks source link

How to retain nan value in csv2 format file when use read() ? #212

Closed zhouhaoan closed 1 year ago

zhouhaoan commented 1 year ago

Hello, I need to process some csv2 data with nan values, nan was saved as ''. According to examples, I try to use read() functions, I can successfully read the table and access to colunms by get_colunm() method. However, I found the length of columns are different and those nan values are filtered out when reading the file, the row indexes is also a mixture. I check the help page of read() function, it seems there doesn't have option to control whether retain nans or not. So what can I do if I want to retain nan value in original file and keep the row index not change?

Thanks!

hosseinmoein commented 1 year ago

Can you post a sample of your data file here? I will take a look

Currently, read() expects nan values to be nan -- ignoring the case -- in the file

zhouhaoan commented 1 year ago

The data is formatted like this, with '' to represent nan value. image

I have trans the headers into csv2 format, like: INDEX:775:<float>,oi:775:<float>,volume:775:<float>,last_price:775:<float>,turnover:775:<float>,ap1:775:<float>,ap2:775:<float>,ap3:775:<float>

hosseinmoein commented 1 year ago

Yeah, currently this will be undefined behavior in DataFrame. It actually shifts the data in the column which will mess up your data alignment.

Can you change ,, to ,nan,?

hosseinmoein commented 1 year ago

I would like to keep it like that, because it is tidier logic. It is less bug prone

zhouhaoan commented 1 year ago

OK,I will try to modify my data, Thanks for answering.