huggingface / dataset-viewer

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
https://huggingface.co/docs/dataset-viewer
Apache License 2.0
696 stars 78 forks source link

Include code snippets for other libraries? #2986

Open severo opened 3 months ago

severo commented 3 months ago

For example, in https://github.com/huggingface/huggingface.js/pull/797, we add distilabel, fiftyone and argilla to the list of libraries the Hub knows. However, the aim is only to handle the user-defined tags better, not to show code snippets.

In this issue, I propose to discuss if we should expand the list of dataset libraries for which we show code snippets. For now, we support pandas, HF datasets, webdatasets, mlcroissant and dask.

We already mentioned polars as a potential new lib, I think. Maybe duckdb too?

burtenshaw commented 3 months ago

With this feature PR Argilla will be able to load predefined dataset repo that contain a .argilla config dir. The dataset could then be loaded in Argilla like this:

import argilla as rg

client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")
dataset = rg.Dataset.from_hub(repo_id="<repo_id>")

Could we show this snippet based on the presence of .argilla ?

julien-c commented 3 months ago

polars/duckdb 👍

julien-c commented 3 months ago

Could we show this snippet based on the presence of .argilla ?

sounds reasonable!

dvsrepo commented 3 months ago

Could we show this snippet based on the presence of .argilla ?

sounds reasonable!

This would be awesome, eventually!

As from_hub will be released along argilla 2.0 in a few days, I think we can need to make it bullet proof with some iteration and further testing with the community