huggingface / dataset-viewer

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
https://huggingface.co/docs/dataset-viewer
Apache License 2.0
676 stars 73 forks source link

Compute leaks between splits? #2994

Open severo opened 1 month ago

severo commented 1 month ago

See https://huggingface.co/blog/lbourdois/lle

Also: should we find the duplicate rows?

julien-c commented 1 month ago

This kind of thing could maybe be a SQL query powered by DuckDB in the console no? (@cfahlgren1 for viz)

like a list of "query examples"

cfahlgren1 commented 1 month ago

This kind of thing could maybe be a SQL query powered by DuckDB in the console no? (@cfahlgren1 for viz)

like a list of "query examples"

Great point. Would great to use explorer to help solve this. Good MVP before it could make it into the hub. I'll make a guide on it.