uchicago-dsi / cbd-ocean-acidification

Automates retrieval and submission of ocean acidification data for the Center for Biological Diversity
GNU General Public License v3.0
1 stars 0 forks source link

King County and IPACOA scrapers #10

Closed egemenpamukcu closed 2 years ago

egemenpamukcu commented 2 years ago

Code for King County and IPACOA scrapers as well as sample data.

sebcl commented 2 years ago

@egemenpamukcu - is there a import re missing in the kingcounty.py? Just wanted to confirm that was the intent for the if not re.match(r'\d{2}/\d{2}/\d{4}', start_date) conditional?

Besides that, the code works on my machine. The only note is how should we handle new csv files being created? It may not be the best idea to keep pushing new csv files into the repo.

egemenpamukcu commented 2 years ago

@egemenpamukcu - is there a import re missing in the kingcounty.py? Just wanted to confirm that was the intent for the if not re.match(r'\d{2}/\d{2}/\d{4}', start_date) conditional?

Besides that, the code works on my machine. The only note is how should we handle new csv files being created? It may not be the best idea to keep pushing new csv files into the repo.

Yeah I apparently forgot to add import re. I will push the updated version soon. Sorry about that.

About the csv files--I think since we're gonna eventually change the code to bulk insert to a database, this could do for now. But meanwhile, we should probably refrain from pushing new data to the repo.

trevorspreadbury commented 2 years ago

The code structure looks good! There are a few typos/bugs in here though.

trevorspreadbury commented 2 years ago

@egemenpamukcu -- I noticed the data returned by kingcounty is in a wide format. Is this intentional? I thought we were going with the format you proposed in the google doc

egemenpamukcu commented 2 years ago

@egemenpamukcu -- I noticed the data returned by kingcounty is in a wide format. Is this intentional? I thought we were going with the format you proposed in the google doc

I will change it to the long format once we have a filtered list of parameters. Because of the inconsistencies in column names, going from wide to long format will require some manual work. Once we filter out the unnecessary parameters it should be easier. And thanks for the fixes.