Open klo9klo9kloi opened 2 months ago
Did you check if the raw EIA files also had this overlap?
How many hours of overlap are you seeing?
I think this code should be working with UTC only in the internals and only converting for exports/plotting.
Sifting through the bulk processed files again, it looks like it is always 20XX-01-01 06:00:00
20XX-01-01 07:00:00
20XX-01-01 08:00:00
, 20XX-07-01 05:00:00
20XX-07-01 06:00:00
20XX-07-01 07:00:00
that overlap between neighboring files
I just checked the raw EIA files and they do not have this problem, the next file always just picks up at the next hour
When looking at e.g.
EIA930_2019_Jan_Jun_co2.csv
, theperiod
starts a few hours after2019-01-01 00:00:00
rather than at the mark, which I'm guessing has something to do with UTC shifting.Keeping that example file, if I then look at
EIA930_2018_Jul_Dec_co2.csv
, theperiod
also overflows into year2019
for a few hours, such that if I concatenate these two files then there are some duplicate periods.If I am aggregating emissions by year, what is the proper way to deal with these duplicate
period
rows? Aggregate? Take the one from the latest year?