xlang-ai / Spider2-V

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
https://spider2-v.github.io
Apache License 2.0
98 stars 5 forks source link

Improve discoverability on Hugging Face #27

Open NielsRogge opened 1 month ago

NielsRogge commented 1 month ago

Hi,

Niels here from the open-source team at Hugging Face. It's great to see you're open-source the data, I discovered your work through the paper page: https://huggingface.co/papers/2407.10956.

However there are a couple of things which could improve the discoverability of your work:

Dataset

I see the data is being open-sourced here, would it be possible to make this available on the hub? This way people could load it in 2 lines of code:

from datasets import load_dataset

dataset = load_dataset("xlang-ai/spider2-v-data")

The dataset itself could be linked to the paper, see here on how to do that: https://huggingface.co/docs/hub/en/datasets-cards#linking-a-paper

Discoverability

Moreover, the discoverability of your work could be improved by adding tags to the dataset card here, as explained here: https://huggingface.co/docs/hub/en/datasets-cards#dataset-cards

Let me know if you need any help regarding this!

Cheers,

Niels ML Engineer @ HF 🤗

rhythmcao commented 1 month ago

Sure. Thanks for the reminder. The dataset (evaluation_examples) has been uploaded to https://huggingface.co/datasets/xlangai/ubuntu_spider2v. However, each task example includes a folder which contains a .json config file and other task-specific dependent files. Directly uploading the entire task folder will fail due to too many files. Thus, I can only upload the compressed .zip file.