hackforla / 311-data

Empowering Neighborhood Associations to improve the analysis of their initiatives using 311 data
https://hackforla.github.io/311-data/
GNU General Public License v3.0
62 stars 63 forks source link

Huggingface changes parquet url #1558

Open edwinjue opened 1 year ago

edwinjue commented 1 year ago

Dependency

Overview

It seems Huggingface changes the URL of the parquet file that is used by our App to load service request data and this could break our App unexpectedly when it happens.

Error in browser console:

bundle.js:318  Uncaught (in promise) Error: Invalid Error: Opening file 'requests.parquet' failed with error: Failed to open file: requests.parquet
    at Go.runQuery (duckdb-browser-eh.worker.js:11:17185)
    at lc.onMessage (duckdb-browser-eh.worker.js:10:55670)
    at Qu.globalThis.onmessage (duckdb-browser-eh.worker.js:24:10976)
onMessage @ bundle.js:318
bundle.js:318  Uncaught (in promise) Error: Catalog Error: Table with name requests does not exist!
Did you mean "pg_sequences"?
LINE 1: select max(createddate) from requests;
                                     ^
    at Go.runQuery (duckdb-browser-eh.worker.js:11:17185)
    at lc.onMessage (duckdb-browser-eh.worker.js:10:55670)
    at Qu.globalThis.onmessage (duckdb-browser-eh.worker.js:24:10976)

The URL yesterday - no longer working https://huggingface.co/datasets/edwinjue/311-data-2023/resolve/refs%2Fconvert%2Fparquet/default/311-data-2023-train.parquet

The URL today - working https://huggingface.co/datasets/edwinjue/311-data-2023/resolve/refs%2Fconvert%2Fparquet/default/train/0000.parquet

Where the Huggingface URL gets used in our codebase: https://github.com/hackforla/311-data/blob/1136d2468e33596af11b3dd2292980317c6a78d7/components/db/DbProvider.jsx#L11

Action Items

Then perhaps later, we can consider more automated means of doing this.

Resources/Instructions

Location of parquet file on huggingface: https://huggingface.co/datasets/edwinjue/311-data-2023/tree/refs%2Fconvert%2Fparquet/default/train

aramattamara commented 1 year ago

Hi @edwinjue, how the datasets appear at Huggingface? Maybe we could do something in that script, to ensure that the URL stays static.