jdechalendar / gridemissions

Tools for power sector emissions tracking
MIT License
35 stars 6 forks source link

Bulk downloads only cover the last 5 months #19

Closed zaneselvans closed 2 weeks ago

zaneselvans commented 6 months ago

I was just looking at the bulk downloads listed on this page and found that some of them don't cover the expected range of dates.

jdechalendar commented 6 months ago

The newer files date back to the updates I made for the EIA's switch to v2 of their API (#12). At that time, I also updated the gridemissions API to now only store one month of historical data. What I did not do back then is go back and process the historical data - breaking the description on the page you referenced.

Updating the historical data was less straightforward than I initially thought because the EIA also changed the way they were making data available in bulk. Instead of one giant csv file with all of the data - they now make the data available in six-month chunks and split it into two files (see here).

Since the datasets generated by this codebase directly depend on the Grid Monitor dataset I think it makes sense for the tools here to also process data in the same six-month chunks.

I just opened #20 to process those six-month datasets and launched the workflow in that PR's Makefile. Eleven six-month chunks are currently available. The CvxCleaner step takes about 1 hour per six-month chunk on the machine I am using, the other steps are not very expensive. I'd like to do a bit of sanity checking once that run completes. If that all looks good, I'll update the way these files are distributed and update the description correspondingly.

Thank you for the reminder to do this!

@ktehranchi FYI

grgmiller commented 6 months ago

FYI, we have some functions in the OGE repo to download and process the six month files into the format you use here. Not sure if this would be helpful.

jdechalendar commented 5 months ago

FYI, we have some functions in the OGE repo to download and process the six month files into the format you use here. Not sure if this would be helpful.

Just saw this. Your code looks similar to what I did here.

BTW, I noticed when looking at your code that I looks like you are using the conventions I used to use for the EIA API before their v2 update. More recent versions of the code in this repo use a different convention. See eia_api_v2.py.