hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.38k stars 298 forks source link

Not proper conversion between timezones? #305

Closed Tonyx97 closed 1 month ago

Tonyx97 commented 1 month ago

Hi, I'm reading a csv2 (as a DataFrame where the index is a DateTime) where the index is DateTimeISO and UTC but when I read it using the library, it will convert the DateTime to local and if I try to change the timezone after reading to UTC then the time won't change, just the timezone. A sample from the csv file:

INDEX::<DateTimeISO>,Timestamp::<int>,Open::<float>,High::<float>,Low::<float>,Close::<float>,Volume::<float>,Target::<int>
2023-01-01 00:00:00,1672531200,16541.77,16544.76,16538.45,16543.67,83.08143,0
2023-01-01 00:01:00,1672531260,16543.04,16544.41,16538.48,16539.31,80.453,0
2023-01-01 00:02:00,1672531320,16539.31,16541.17,16534.52,16536.43,62.90197,0
2023-01-01 00:03:00,1672531380,16536.43,16537.28,16531.0,16533.65,115.71894,0
2023-01-01 00:04:00,1672531440,16534.12,16536.08,16527.51,16535.38,144.45369,0
2023-01-01 00:05:00,1672531500,16534.91,16537.8,16533.94,16536.7,53.58957,0

And the code I'm currently using.

std::stringstream ss(data);

ok = df.read(ss, hmdf::io_format::csv2);

// change the timezone of the index to UTC

for (auto& index : df.get_index())
    index.set_timezone(hmdf::DT_TIME_ZONE::UTC);

If I place a breakpoint in the last line and check the values of the first index, I get 1672527600 instead of 1672531200 (because I assume it's converting it to local time, GMT+2) but after calling set_timezone, the time won't turn into 1672531200 which is the actual UTC time. My idea would be to allow the user to specify the default timezone when reading the CSV. Any idea how to solve this? Thanks.

hosseinmoein commented 1 month ago

What DateTime does when you change the timezone is to keep the numeric epoch time constant and change the date-time components (date, hour, minute, ...) accordingly. Currently the only timezone that makes sense in csv files is the local timezone.

What I could do is to allow user to specify timezone as part of the string format. But that's a relatively big job. I have to find time to do it.

Tonyx97 commented 1 month ago

What DateTime does when you change the timezone is to keep the numeric epoch time constant and change the date-time components (date, hour, minute, ...) accordingly. Currently the only timezone that makes sense in csv files is the local timezone.

What I could do is to allow user to specify timezone as part of the string format. But that's a relatively big job. I have to find time to do it.

Oh okay, thanks for the reply, I guess I can simply use remove the date and use the timestamp I have in the csv as a workaround, it seems to be working fine although I need some extra conversions between date and timestamp.

hosseinmoein commented 1 month ago

@Tonyx97 ,

I implemented this in master. Now you can have a string like:

2024-06-19 13:32 GMT

Also see test/date_time_tester.cc and DateTime documentation