microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs
https://aka.ms/GeneralAI
MIT License
3.6k stars 274 forks source link

Questions about task datasets used in the paper "Adapt LLM to domains" #182

Closed Lydia-yang closed 6 months ago

Lydia-yang commented 6 months ago

Hi, thanks for your great job. I am following your work and have a question about the dataset. Since I found that only test data are released, could you please release all the task datasets (including training and test data) used in fine-tuning?

cdxeve commented 6 months ago

Hi, thanks for your interest in our work, we will release the datasets on huggingface within this week.

cdxeve commented 6 months ago

Hi, the datasets are available on huggingface now:

The other datasets used in our paper have already been available in huggingface, and you can directly load them with the following code:

from datasets import load_dataset

# MQP:
dataset = load_dataset('medical_questions_pairs')
# PubmedQA:
dataset = load_dataset('bigbio/pubmed_qa')
# USMLE:
dataset=load_dataset('GBaker/MedQA-USMLE-4-options')
# SCOTUS
dataset = load_dataset("lex_glue", 'scotus')
# CaseHOLD
dataset = load_dataset("lex_glue", 'case_hold')
# UNFAIR-ToS
dataset = load_dataset("lex_glue", 'unfair_tos')