hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.38k stars 298 forks source link

Problem about reading CSV with empty values. #300

Closed RickyKongCoder closed 2 months ago

RickyKongCoder commented 2 months ago

I'm a user of the C++ dataframe library. I wanted to start by expressing my appreciation for your impressive library. I do have a question regarding the csv reading function, and I was hoping you could provide some clarification. I noticed that when using the "read" function to read a csv file, it skips over NaN values. I was curious about the reasoning behind this design choice, as opposed to following the traditional approach of considering NaN as representing empty values, as done in pandas. Could you please shed some light on this matter? Thank you in advance for your assistance.

hosseinmoein commented 2 months ago

It might be a bug that was introduced recently. Can you copy/past a few lines of data where there is a nan? Is it like ,nan,, or like ,,, or something else? Also, when you read the data into DataFrame, what do you see in the place of nan? Is it the next value?

RickyKongCoder commented 2 months ago

so basically here is a data of 9 rows with 5 empty values at column close2. csv_problem This is the code I write, very simple, just reading the csv at write in console. code

This is the output of console. console

RickyKongCoder commented 2 months ago

You can see that when reading "close2" column, the read function skipped the first 5 empty values.

RickyKongCoder commented 2 months ago

This is the csv of data: INDEX:9:,close:9:,close2:9: 0,13731.70654, 1,13652.44631, 2,13634.6356, 3,13567.47794, 4,13534.40545, 5,13634.03615,5 6,13622.68511,6 7,13567.4479,7 8,13586.30945,8

RickyKongCoder commented 2 months ago

I have also tried this: INDEX:9:,close:9:,close2:9: 0,13731.70654,, 1,13652.44631,, 2,13634.6356,, 3,13567.47794,, 4,13534.40545,, 5,13634.03615,5, 6,13622.68511,6, 7,13567.4479,7, 8,13586.30945,8,

it also give the same result.

hosseinmoein commented 2 months ago

Hmm, that's even worst than what you said. The rows in the console output doesn't match the rows in the file. I have to look at it

hosseinmoein commented 2 months ago

@RickyKongCoder ,

This has been fixed in master. Thanks for spoting it