Closed kilasuelika closed 2 years ago
Deciding column types for string
, int
and double
are easy but for datetime it will be very difficult.
I hear what you are saying. There are definitely advantages to it. The only thing is that the implementation could be very bug prone. I have to think about it.
BTW, I know of people who use both csv
and csv2
formats. Personally I like csv
better. It is more compact, it is columnar, and read/writes are more efficient
Even auto-detecting between int and double is not that simple. Imagine if a few first items of a double column happen to be integers. Or if the data by nature is integer but the user wants to have a double column for calculation purposes. For example, number of shares traded is an integer value by nature. But most often it is stored as a floating point because the column is involved in calculations and comparisons with other floating point figures.
when you write the data in DataFrame by using the write()
function, it will store all the info in needs to read it back.
Also, the same thing is to_string()
and from_string()
After some research, I decided to write my own DataFrame library. You can check it on DataFrameCpp
I suggest provide a new
normal_csv
format that the column type is automatically detected. Use a two-pass procedure. First read some initial rows to decide column type and then the second pass to read values.The current
csv
andcsv2
format is rarelly used and inconvenient.