hackalog / easydata

A flexible template for doing reproducible data science in Python.
MIT License
105 stars 22 forks source link

Need "how-to-write-test" documentation #217

Open hackalog opened 3 years ago

hackalog commented 3 years ago

We recently pulled #211 which included reproducer code.

We need to document how to write a test for Easydata. (this is a bit non-obvious, because to run a test, we need to check out easydata, generate a CI repo from easydata, then run the tests within the generated repo)

Then we can add a link to the contributor docs with a "how to file a PR" process

As a test case, use the reproducer from #211 to try out the process.

IN particular, need to: 1) Run setup code (create a subdirectory data/raw/custom) 2) run the test: (download a CSV from a URL to data/raw/custom using a Datasource; i.e.

from src.data import DataSource
from src import paths

ds_name = 'test_custom_data_dir'
raw_fn = "epidemiology.csv"
url = f'https://storage.googleapis.com/covid19-open-data/v2/{raw_fn}'
download_dir = paths["data_path"]
dsrc = DataSource(ds_name, download_dir=download_dir)
dsrc.add_url(url=url)
dsrc.fetch()
filename = download_dir / raw_fn
assert filename.exists(), "download to custom dir failed"

3) Run teardown code: (delete the subdir and its contents)

Note: we should use a lighter-weight CSV download.