mapaction / cod-ab-data-quality

To create an quantitative assessment mechanism that enables the prioritisation of work to update or enhance existing COD-AB data sets. The expected output is a quality index for each COD-AB data set based on tests for features (geographical, metadata, etc).
GNU General Public License v3.0
1 stars 0 forks source link

Feature: Data Download #3

Closed maxmalynowsky closed 1 month ago

maxmalynowsky commented 1 month ago

Adds two modules for assessing COD-AB quality:

Other changes:

vshlemon commented 1 month ago

Trying to test out the httpx.download option using just poetry & no docker. Got the following error, I believe it originates in the way the Path is stated here - cod-ab-data-quality/src/download/__init__.py

image

maxmalynowsky commented 1 month ago

Trying to test out the httpx.download option using just poetry & no docker. Got the following error, I believe it originates in the way the Path is stated here - cod-ab-data-quality/src/download/__init__.py

Not likely, otherwise src.metadata would have failed since it sets Path in the exact same way. What I think might be happening is that when you ran the command with Docker, there's a mkdir that created the download directory if it doesn't exist. It looks like there's a permission error accessing that folder from when it was created with Docker under WSL and then accessed by Windows. I restructured the data directory to add the folder boundaries are downloaded to into GitHub which should prevent that error.

vshlemon commented 1 month ago

Thanks Max - it seems ready to merge, but it takes a while as it downloads everything each time. I imagine sometimes this might not be desired so an option at the CLI for overwrite & output locations would be handy. My comments for the CLI features are below - I can either close this branch (merge) & open an issue to implement them or if you'd prefer to I can leave the branch open if you wanted to implement them? I am happy either way, would be good for me to implement to get more familiar but cool if you want to.

image

...ps. the cloud one will be unachievable for now and is just a comment for future

maxmalynowsky commented 1 month ago

Thanks Max - it seems ready to merge, but it takes a while as it downloads everything each time. I imagine sometimes this might not be desired so an option at the CLI for overwrite & output locations would be handy. My comments for the CLI features are below - I can either close this branch (merge) & open an issue to implement them or if you'd prefer to I can leave the branch open if you wanted to implement them? I am happy either way, would be good for me to implement to get more familiar but cool if you want to.

Thanks a lot for battle testing this, you've given a lot of super useful feedback, downloading is the hardest part since it deals with things outside of our control. I think we should merge now, otherwise this runs the risk of turning into a forever branch.

On the things you brought up:

vshlemon commented 1 month ago

Ok am merging now - will open up the download options as issues as I think it would be useful to include.

I meant adding arguments to the module execution via. an argparser. I wasn't envisioning make as the main point of entry for everyone, but it could keep the default args for module execution.

Yep, not as default, just for flexibility.

And I meant more for where you download to in case you didn't want to save locally. But that could come later if there's a need. And adding an argparser would enable for different sources to download from as well (or methods) so that would be good.

This is really great stuff, Thanks for your patience with me testing it out as well :)