holoviz-topics / examples

Visualization-focused examples using HoloViz for specific topics
https://examples.holoviz.org
Creative Commons Attribution 4.0 International
84 stars 25 forks source link

Census 2020 example #430

Open Azaya89 opened 1 month ago

Azaya89 commented 1 month ago

Created a new example using the 2020 US census dataset. The file exists locally as a large .parq file that will be uploaded to S3 at a later time.

NOTES:

  1. The url added in the downloads section of the anaconda-project.yml files is not a real link and that is what is causing the CI build failure.
maximlt commented 1 month ago

I suspect it is due to https://github.com/holoviz-topics/examples/pull/429 but I'm not sure how to resolve it.

You need to re-create the conda environment locally following the contributing guide.

The test file added is a 0.1% sample of the full dataset but it is still about 8MB in size. I don't know if that is too large and should be reduced further.

It's still way too large. You should aim for the minimum dataset size possible, it's fine if it's just a few KB as long as it contains data that is representative of the whole dataset. For instance, if the code expects some data category, then it should be in the sample dataset to let the notebook run entirely.

Azaya89 commented 1 month ago

Is there an absolute need to rename the original census project census_one? Without doing anything else, this is going to break all the links to its web page and deployment.

I would also not call the new one census_two but census2020.

I imagine renaming the original from census to something else makes sense seeing as there are now more than one census notebooks in the examples gallery (and possibly more in the future). However, I tried renaming both to census2010 and census2020 but the doit validate step emits a warning that only lower case characters and underscore allowed in the naming. I wasn't sure ignoring that warning was ideal that is why I now renamed both to the current names.

maximlt commented 1 month ago

However, I tried renaming both to census2010 and census2020 but the doit validate step emits a warning that only lower case characters and underscore allowed in the naming

Sounds like a bug in the validation code, something like census2020 should be allowed.

Azaya89 commented 1 month ago

You need to re-create the conda environment locally following the contributing guide.

Done. Thanks

It's still way too large. You should aim for the minimum dataset size possible, it's fine if it's just a few KB as long as it contains data that is representative of the whole dataset. For instance, if the code expects some data category, then it should be in the sample dataset to let the notebook run entirely.

Reduced it to <1MB now.

maximlt commented 1 month ago

Replying to your comment elsewhere:

Thank you. I'm still in favor of renaming the first one to census2010 though.

If you intend to rename it, then redirect links have to be set up:

Alternatively, we could just:

Azaya89 commented 1 month ago

Alternatively, we could just:

  • Change the title property in the project YAML to Census 2010
  • Change the notebook top-level heading to Census 2010

I already did these in this PR. Would that be enough to differentiate both examples eventually?

maximlt commented 1 month ago

Would that be enough to differentiate both examples eventually?

I think so?

Azaya89 commented 1 month ago

I think so?

OK. I will revert the other renaming then

hoxbro commented 3 weeks ago

My suggestion was that you use the processing script to save it to disk as new data and use that data in the notebook.

Azaya89 commented 3 weeks ago

My suggestion was that you use the processing script to save it to disk as new data and use that data in the notebook.

Oh? Alright then. Will do...

maximlt commented 2 weeks ago

@Azaya89 you will need to re-lock the project as the solve is failing:

Channels:
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed

PackagesNotFoundError: The following packages are not available from current channels:

  - libcurl==8.11.0=hbbe4b11_0

Not your fault, sometimes conda-forge marks some packages as broken (adding the broken label on conda-forge) which means these packages are no longer available on the conda-forge channel but on conda-forge/label/broken.

https://github.com/conda-forge/admin-requests/pull/1147