huggingface / dataset-viewer

Lightweight web API for visualizing and exploring any dataset - computer vision, speech, text, and tabular - stored on the Hugging Face Hub
https://huggingface.co/docs/datasets-server
Apache License 2.0
642 stars 65 forks source link

Use placeholder revision in urls in cached responses #2966

Open lhoestq opened 6 days ago

lhoestq commented 6 days ago

Previously, we were using the dataset revision in the URLs of image/audio files of cached responses of /first-rows. However when a dataset gets its README updated, we update the dataset_git_revision of the cache entries and the location of the image/audio files on S3 but we don't modify the revision in the URLs in the cached response. This resulted in the Viewer not showing the images after modifying the readme of a dataset.

image

I fixed that for future datasets by not using the revision in the URLs anymore and use a placeholder that is replaced by dataset_git_revision when the cached response is accessed

Implementation details

I modified the URL Signer logic to also insert the revision in the URL and renamed it to a URL Preparator. It takes care of inserting the revision and signing the URLs.

close https://github.com/huggingface/dataset-viewer/issues/2965