sileod / tasksource

Datasets collection and preprocessings framework for NLP extreme multitask learning
Apache License 2.0
140 stars 7 forks source link

What is the difference between tasksource and FLAN? #7

Open imoneoi opened 8 months ago

imoneoi commented 8 months ago

Thanks for the great work! I'm also using FLAN for training, so I'm wondering how to include only tasks that are in Tasksource but not in FLAN.

sileod commented 8 months ago

Thanks ! Tasksource is designed for transparency and quick addition of new tasks, and composability. Tasksource tasks can be recasted programmatically into instructions or used for classification.

Tasksource has the only symbol-tuning available to my knowledge, greatly improves few shot learning https://huggingface.co/datasets/tasksource/icl-symbol-tuning-instruct in addition to tasksource instruct-v0

tasksource is more up to date and I tried to be exhaustive (focus on reasoning/logic/NLI), but it has lower prompt formulation diversity. flan also include some tasks that are not very reasoning intensive like formulating hypothesis given a premise, these are quite interesting but should be sampled

Are you talking about Flan or Flan with SNI ?

Also see https://www.dataprovenance.org/ for many instruction datasets, we are planning to work with prominent model builders, I would be glad to chat with you on e.g. discord

Task id not in FlanV2/Bigbench/MMLU/truthfulQA/chatbot_arena_conversations:

[' 'WANLI', 'recast/recast_verbnet', 'recast/recast_verbcorner', 'recast/recast_ner', 'recast/recast_sentiment', 'recast/recast_puns', 'recast/recast_factuality', 'recast/recast_megaveridicality', 'probability_words_nli/reasoning_1hop', 'probability_words_nli/usnli', 'probability_words_nli/reasoning_2hop', 'nan-nli/joey234--nan-nli', 'nli_fever', 'breaking_nli', 'conj_nli', 'fracas', 'dialogue_nli', 'mpe', 'dnc', 'recast_white/fnplus', 'recast_white/sprl', 'recast_white/dpr', 'robust_nli/IS_CS', 'robust_nli/LI_LI', 'robust_nli/ST_WO', 'robust_nli/PI_SP', 'robust_nli/PI_CD', 'robust_nli/ST_SE', 'robust_nli/ST_NE', 'robust_nli/ST_LM', 'robust_nli_is_sd', 'robust_nli_li_ts', 'gen_debiased_nli/snli_seq_z', 'gen_debiased_nli/snli_z_aug', 'gen_debiased_nli/snli_par_z', 'gen_debiased_nli/mnli_par_z', 'gen_debiased_nli/mnli_z_aug', 'gen_debiased_nli/mnli_seq_z', 'add_one_rte', 'hlgd', 'conll2003/pos_tags', 'conll2003/chunk_tags', 'conll2003/ner_tags', 'hh-rlhf', 'model-written-evals', 'fig-qa', 'social_i_qa', 'balanced-copa', 'e-CARE', 'insincere-questions', 'TuringBench', 'vitaminc/tals--vitaminc', 'rumoureval_2019/RumourEval2019', 'tweet_eval/irony', 'tweet_eval/stance_abortion', 'tweet_eval/hate', 'tweet_eval/stance_atheism', 'tweet_eval/stance_climate', 'tweet_eval/emoji', 'tweet_eval/offensive', 'tweet_eval/sentiment', 'tweet_eval/emotion', 'tweet_eval/stance_feminist', 'tweet_eval/stance_hillary', 'discovery/discovery', 'pragmeval/verifiability', 'pragmeval/mrda', 'pragmeval/switchboard', 'pragmeval/emergent', 'pragmeval/gum', 'pragmeval/sarcasm', 'pragmeval/stac', 'pragmeval/pdtb', 'silicone/dyda_e', 'silicone/oasis', 'silicone/meld_s', 'silicone/meld_e', 'silicone/maptask', 'silicone/dyda_da', 'silicone/sem', 'silicone/iemocap', 'lex_glue/scotus', 'lex_glue/ledgar', 'language-identification', 'rotten_tomatoes', 'hate_speech18', 'sms_spam', 'snips_built_in_intents', 'hate_speech_offensive', 'hyperpartisan_news', 'sciie', 'citation_intent', 'scicite', 'lexical_relation_classification/ROOT09', 'lexical_relation_classification/CogALexV', 'lexical_relation_classification/K&H+N', 'lexical_relation_classification/BLESS', 'lexical_relation_classification/EVALution', 'crowdflower/political-media-bias', 'crowdflower/tweet_global_warming', 'crowdflower/text_emotion', 'crowdflower/political-media-message', 'crowdflower/political-media-audience', 'crowdflower/economic-news', 'crowdflower/corporate-messaging', 'crowdflower/airline-sentiment', 'crowdflower/sentiment_nuclear_power', 'ethics/commonsense', 'ethics/deontology', 'ethics/justice', 'ethics/virtue', 'tweets_hate_speech_detection', 'wnut_17/wnut_17', 'ncbi_disease/ncbi_disease', 'acronym_identification', 'jnlpba/jnlpba', 'ontonotes_english/SpeedOfMagic--ontonotes_english', 'blog_authorship_corpus/gender', 'blog_authorship_corpus/horoscope', 'blog_authorship_corpus/job', 'open_question_type', 'mc_taco', 'discosense', 'EffectiveFeedbackStudentWriting', 'phrase_similarity', 'scientific-exaggeration-detection', 'fever-evidence-related/mwong--fever-related', 'dynasent/dynabench.dynasent.r1.all/r1', 'dynasent/dynabench.dynasent.r2.all/r2', 'sem_eval_2010_task_8', 'medmcqa', 'logiqa', 'cycic_classification', 'cycic_multiplechoice', 'commonsense_qa_2.0', 'lingnli', 'monotonicity-entailment', 'arct', 'scinli', 'naturallogic', 'onestop_qa', 'moral_stories/full', 'prost', 'dynahate', 'syntactic-augmentation-nli', 'autotnli', 'CONDAQA', 'webgpt_comparisons', 'synthetic-instruct-gptj-pairwise', 'scruples', 'wouldyourather', 'attempto-nli', 'defeasible-nli/snli', 'defeasible-nli/atomic', 'help-nli', 'nli-veridicality-transitivity', 'natural-language-satisfiability', 'lonli', 'dadc-limit-nli', 'FLUTE', 'summarize_from_feedback/comparisons', 'folio', 'tomi-nli', 'avicenna', 'SHP', 'MedQA-USMLE-4-options-hf', 'wikimedqa/medwiki', 'cicero', 'mutual', 'NeQA', 'quote-repetition', 'redefine-math', 'puzzte', 'implicatures', 'race-c', 'spartqa-yn', 'spartqa-mchoice', 'temporal-nli', 'riddle_sense', 'clcd-english', 'twentyquestions', 'reclor', 'counterfactually-augmented-imdb', 'counterfactually-augmented-snli', 'cnli', 'boolq-natural-perturbations', 'equate', 'ScienceQA_text_only', 'ekar_english', 'implicit-hate-stg1', 'logiqa-2.0-nli', 'PARARULE-Plus', 'mindgames', 'universal_dependencies/en_partut/deprel', 'universal_dependencies/en_lines/deprel', 'universal_dependencies/en_gum/deprel', 'universal_dependencies/en_ewt/deprel', 'ambient', 'path-naturalness-prediction', 'cloth', 'dgen', 'oasst1_pairwise_rlhf_reward', 'I2D2', 'args_me', 'Touche23-ValueEval', 'starcon', 'banking77', 'ruletaker', 'lsat_qa/all', 'ConTRoL-nli', 'tracie', 'sherliic', 'sen-making/1', 'sen-making/2', 'mbib-base/cognitive-bias', 'mbib-base/fake-news', 'mbib-base/gender-bias', 'mbib-base/hate-speech', 'mbib-base/linguistic-bias', 'mbib-base/political-bias', 'mbib-base/racial-bias', 'mbib-base/text-level-bias', 'robustLR', 'v1/gen_train234_test2to10', 'logical-fallacy', 'parade', 'cladder', 'subjectivity', 'MOH', 'VUAC', 'TroFi', 'sharc_modified/mod', 'conceptrules_v2', 'disrpt/eng.dep.scidtb', 'conll2000', 'few-nerd/supervised', 'zero-shot-label-nli', 'com2sense', 'scone', 'winodict', 'fool-me-twice', 'monli', 'corr2cause', 'apt', 'twitter-financial-news-sentiment', 'icl-symbol-tuning-instruct', 'SpaceNLI', 'propsegment/nli', 'HatemojiBuild', 'regset', 'esci', 'dnd_style_intents']

imoneoi commented 8 months ago

@sileod Thanks for the detailed response!

I'm using the FLAN 2022 dataset (https://huggingface.co/datasets/Open-Orca/FLAN). What is FLAN with SNI? Also, are these tasks listed not present in FLAN 2022 and Bigbench and MMLU?

Besides, I'm also interested in symbol tuning. My Discord is imonenext, feel free to DM.