singularity-energy / open-grid-emissions

Tools for producing high-quality hourly generation and emissions data for U.S. electric grids
MIT License
72 stars 5 forks source link

Adds a notebook for checking OGE download zip files #270

Closed miloknowles closed 11 months ago

miloknowles commented 1 year ago

Adds a notebook for downloading and checking OGE zip files. We should eventually integrate some of these checks into the data pipeline itself.

This is the print out from the notebook (not all of it is included):

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/SOCO.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/WWA.csv
FAIL: Line 4204 is too long (2998940 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/YAD.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/SEC.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/TIDC.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/SCEG.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/SEPA.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/PNM.csv
FAIL: Line 1158 is too long (5636539 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/WAUW.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/WACM.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/WALC.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/TEPC.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/RIMS.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/PSCO.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/SCL.csv
FAIL: Line 11570 is too long (17579241 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2020_power_sector_data_hourly_metric_units/PGE.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/FMPP.csv
FAIL: Line 6606 is too long (5389942 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/TPWR.csv
FAIL: Line 1406 is too long (24873875 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/SOCO.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/IPCO.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/PJM.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/AZPS.csv
FAIL: Line 3044 is too long (3266956 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/DOPD.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/IID.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/HGMA.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/DEAA.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/CPLW.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/SPA.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/FPL.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/SEC.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/GCPD.csv
FAIL: Line 2368 is too long (21769544 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/TVA.csv
FAIL: Line 9821 is too long (5520190 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/TIDC.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/HECO.csv
FAIL: Line 4764 is too long (18297177 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/SCEG.csv
FAIL: Line 554 is too long (21782609 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/CPLE.csv
FAIL: Line 8033 is too long (6653636 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/SEPA.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/AKMS.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/CHPD.csv
FAIL: Line 6706 is too long (17592100 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/PNM.csv
FAIL: Line 22110 is too long (921854 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/WAUW.csv
FAIL: Line 1151 is too long (12994434 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/WALC.csv
FAIL: Line 5476 is too long (15912218 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/NWMT.csv
FAIL: Line 13808 is too long (23007625 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/PACE.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/AVA.csv
FAIL: Line 3232 is too long (16751352 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/SC.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/GWA.csv
FAIL: File is empty

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/HIMS.csv
FAIL: Line 23959 is too long (6870196 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/TEPC.csv
FAIL: Line 5297 is too long (3184852 chars). This is probably a corrupted line.

/Users/milo.knowles/singularity/open-grid-emissions/data/downloads/oge/2021_power_sector_data_hourly_metric_units/HST.csv
gailin-p commented 1 year ago

@miloknowles In the spirit of automating validation, could you turn this logic into a function and call it from output_data.prepare_files_for_upload?

grgmiller commented 1 year ago

I think too in that spirit, we are not going to want to have to download the uploaded files to do this check. Instead, could we just check after zipping the files that the zipped file size falls within an expected range of kB or is greater than some threshold?