huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.26k stars 2.7k forks source link

Request to Share/Update Dataset Viewer Code #6014

Closed lilyorlilypad closed 1 year ago

lilyorlilypad commented 1 year ago

Overview: The repository (huggingface/datasets-viewer) was recently archived and when I tried to run the code, there was the error message "AttributeError: module 'datasets.load' has no attribute 'prepare_module'". I could not resolve the issue myself due to lack of documentation of that attribute.

Request: I kindly request the sharing of the code responsible for the dataset preview functionality or help with resolving the error. The dataset viewer on the Hugging Face website is incredibly useful since it is compatible with different types of inputs. It allows users to find datasets that meet their needs more efficiently. If needed, I am willing to contribute to the project by testing, documenting, and providing feedback on the dataset viewer code.

Thank you for considering this request, and I look forward to your response.

lhoestq commented 1 year ago

Hi ! The huggingface/dataset-viewer code was not maintained anymore because we switched to a new dataset viewer that is deployed available for each dataset the Hugging Face website.

What are you using this old repository for ?

mariosasko commented 1 year ago

I think these parts are outdated:

To make the viewer work, the first one should be replaced with the following:

dataset_module = datasets.load.dataset_module_factory(path)
builder_cls = datasets.load.import_main_class(dataset_module.module_path)
confs = builder_cls.BUILDER_CONFIGS

And the second one:

dataset_module = datasets.load.dataset_module_factory(path)
builder_cls = datasets.load.import_main_class(dataset_module.module_path)
if conf:
    builder_instance = builder_cls(name=conf, cache_dir=path if path_to_datasets is not None else None)
else:
    builder_instance = builder_cls(cache_dir=path if path_to_datasets is not None else None)

But as @lhoestq suggested, it's better to use the datasets-server API nowadays to fetch the rows.

julien-c commented 1 year ago

The dataset viewer on the Hugging Face website is incredibly useful

@mariosasko i think @lilyorlilypad wants to run the new dataset-viewer, not the old one

lilyorlilypad commented 1 year ago

wants to run the new dataset-viewer, not the old one

Thanks for the clarification for me. I do want to run the new dataset-viewer.

lhoestq commented 1 year ago

It should be possible to run it locally using the HF datasets-server API (docs here) but the front end part is not open source (yet ?)

The back-end is open source though if you're interested: https://github.com/huggingface/datasets-server It automatically converts datasets on HF to Parquet, which is the format we use to power the viewer.

julien-c commented 1 year ago

the new frontend would probably be hard to open source, as is, as it's quite intertwined with the Hub's code.

However, at some point it would be amazing to have a community-driven open source implementation of a frontend to datasets-server!

severo commented 1 year ago

For the frontend viewer, see https://github.com/huggingface/datasets/issues/6139.

Also mentioned in https://github.com/huggingface/datasets-server/issues/213 and https://github.com/huggingface/datasets-server/issues/441

Closing as a duplicate of https://github.com/huggingface/datasets/issues/6139

jacob-rodgers-max commented 4 months ago

Hi team,

I'm currently researching the Dataset Viewer project and would like to understand more about the frontend technologies used. Specifically, I'm interested in knowing:

Which frontend framework is being utilized (e.g., React, Vue, etc.)? Are there any specific libraries or components being used for UI (e.g., Material-UI, Ant Design)? Any other notable frontend tools or technologies that are part of this project? Your assistance in providing these details would be greatly appreciated. Thank you for your time and effort!

Best regards

julien-c commented 4 months ago

@jacob-rodgers-max we use https://svelte.dev/

jacob-rodgers-max commented 3 months ago

@jacob-rodgers-max we use https://svelte.dev/

Thank you very much for your prompt and detailed response!