rjbergerud / Super-Sankey-CAN-Energy-Flows

1 stars 2 forks source link

Organizing data for industrial sector uses by energy source #7

Closed sophiesheeline closed 3 years ago

sophiesheeline commented 3 years ago

Do you have access to the original data? I moved it into my local git repo but it didn't give me the option to 'add' the data to tracked files. It was stored in my local repo in a subfolder of data called 'idca2017e'

rjbergerud commented 3 years ago

I have the repo setup to ignore files in the data folder, since I was thinking ideally we'd either script downloading the data we needed, or if we were concerned about availability, store it somewhere else ourselves and again script the download. Open to changing this though, and if you'd like to track this file, you'd have to add an exception to the .gitignore file.

The url I have for the data downloads it as a single .xls file inside of a zip, https://oee.nrcan.gc.ca/corporate/statistics/neud/dpa/data_e/downloads/comprehensive/zip/2017/idca2017e.zip

Do you get the data from somewhere else where each sheet was already separated into a separate file?

sophiesheeline commented 3 years ago

I used a VBA script to parse the .xls file you linkedin into automatically-named csv files so I could just loop through them directly. I'm sure that is also possible to do in the jupyter notebook code so I could either 1) post the VBA code that can be run in Excel to anyone wanting to view / use this notebook, or 2) retroactively add a cell at the top of the script that does the parsing within the notebook. I think given time constraints, it doesn't make a ton of sense to do option 2 now but we can do that after Friday to make things cleaner?

rjbergerud commented 3 years ago

Yeah, agreed, #1 makes the most sense

sophiesheeline commented 3 years ago

Consolidated the for loop as your suggested -- thanks for mentioning that!