Open imoneoi opened 12 months ago
Thanks ! Tasksource is designed for transparency and quick addition of new tasks, and composability. Tasksource tasks can be recasted programmatically into instructions or used for classification.
Tasksource has the only symbol-tuning available to my knowledge, greatly improves few shot learning https://huggingface.co/datasets/tasksource/icl-symbol-tuning-instruct in addition to tasksource instruct-v0
tasksource is more up to date and I tried to be exhaustive (focus on reasoning/logic/NLI), but it has lower prompt formulation diversity. flan also include some tasks that are not very reasoning intensive like formulating hypothesis given a premise, these are quite interesting but should be sampled
Are you talking about Flan or Flan with SNI ?
Also see https://www.dataprovenance.org/ for many instruction datasets, we are planning to work with prominent model builders, I would be glad to chat with you on e.g. discord
Task id not in FlanV2/Bigbench/MMLU/truthfulQA/chatbot_arena_conversations:
[' 'WANLI', 'recast/recast_verbnet', 'recast/recast_verbcorner', 'recast/recast_ner', 'recast/recast_sentiment', 'recast/recast_puns', 'recast/recast_factuality', 'recast/recast_megaveridicality', 'probability_words_nli/reasoning_1hop', 'probability_words_nli/usnli', 'probability_words_nli/reasoning_2hop', 'nan-nli/joey234--nan-nli', 'nli_fever', 'breaking_nli', 'conj_nli', 'fracas', 'dialogue_nli', 'mpe', 'dnc', 'recast_white/fnplus', 'recast_white/sprl', 'recast_white/dpr', 'robust_nli/IS_CS', 'robust_nli/LI_LI', 'robust_nli/ST_WO', 'robust_nli/PI_SP', 'robust_nli/PI_CD', 'robust_nli/ST_SE', 'robust_nli/ST_NE', 'robust_nli/ST_LM', 'robust_nli_is_sd', 'robust_nli_li_ts', 'gen_debiased_nli/snli_seq_z', 'gen_debiased_nli/snli_z_aug', 'gen_debiased_nli/snli_par_z', 'gen_debiased_nli/mnli_par_z', 'gen_debiased_nli/mnli_z_aug', 'gen_debiased_nli/mnli_seq_z', 'add_one_rte', 'hlgd', 'conll2003/pos_tags', 'conll2003/chunk_tags', 'conll2003/ner_tags', 'hh-rlhf', 'model-written-evals', 'fig-qa', 'social_i_qa', 'balanced-copa', 'e-CARE', 'insincere-questions', 'TuringBench', 'vitaminc/tals--vitaminc', 'rumoureval_2019/RumourEval2019', 'tweet_eval/irony', 'tweet_eval/stance_abortion', 'tweet_eval/hate', 'tweet_eval/stance_atheism', 'tweet_eval/stance_climate', 'tweet_eval/emoji', 'tweet_eval/offensive', 'tweet_eval/sentiment', 'tweet_eval/emotion', 'tweet_eval/stance_feminist', 'tweet_eval/stance_hillary', 'discovery/discovery', 'pragmeval/verifiability', 'pragmeval/mrda', 'pragmeval/switchboard', 'pragmeval/emergent', 'pragmeval/gum', 'pragmeval/sarcasm', 'pragmeval/stac', 'pragmeval/pdtb', 'silicone/dyda_e', 'silicone/oasis', 'silicone/meld_s', 'silicone/meld_e', 'silicone/maptask', 'silicone/dyda_da', 'silicone/sem', 'silicone/iemocap', 'lex_glue/scotus', 'lex_glue/ledgar', 'language-identification', 'rotten_tomatoes', 'hate_speech18', 'sms_spam', 'snips_built_in_intents', 'hate_speech_offensive', 'hyperpartisan_news', 'sciie', 'citation_intent', 'scicite', 'lexical_relation_classification/ROOT09', 'lexical_relation_classification/CogALexV', 'lexical_relation_classification/K&H+N', 'lexical_relation_classification/BLESS', 'lexical_relation_classification/EVALution', 'crowdflower/political-media-bias', 'crowdflower/tweet_global_warming', 'crowdflower/text_emotion', 'crowdflower/political-media-message', 'crowdflower/political-media-audience', 'crowdflower/economic-news', 'crowdflower/corporate-messaging', 'crowdflower/airline-sentiment', 'crowdflower/sentiment_nuclear_power', 'ethics/commonsense', 'ethics/deontology', 'ethics/justice', 'ethics/virtue', 'tweets_hate_speech_detection', 'wnut_17/wnut_17', 'ncbi_disease/ncbi_disease', 'acronym_identification', 'jnlpba/jnlpba', 'ontonotes_english/SpeedOfMagic--ontonotes_english', 'blog_authorship_corpus/gender', 'blog_authorship_corpus/horoscope', 'blog_authorship_corpus/job', 'open_question_type', 'mc_taco', 'discosense', 'EffectiveFeedbackStudentWriting', 'phrase_similarity', 'scientific-exaggeration-detection', 'fever-evidence-related/mwong--fever-related', 'dynasent/dynabench.dynasent.r1.all/r1', 'dynasent/dynabench.dynasent.r2.all/r2', 'sem_eval_2010_task_8', 'medmcqa', 'logiqa', 'cycic_classification', 'cycic_multiplechoice', 'commonsense_qa_2.0', 'lingnli', 'monotonicity-entailment', 'arct', 'scinli', 'naturallogic', 'onestop_qa', 'moral_stories/full', 'prost', 'dynahate', 'syntactic-augmentation-nli', 'autotnli', 'CONDAQA', 'webgpt_comparisons', 'synthetic-instruct-gptj-pairwise', 'scruples', 'wouldyourather', 'attempto-nli', 'defeasible-nli/snli', 'defeasible-nli/atomic', 'help-nli', 'nli-veridicality-transitivity', 'natural-language-satisfiability', 'lonli', 'dadc-limit-nli', 'FLUTE', 'summarize_from_feedback/comparisons', 'folio', 'tomi-nli', 'avicenna', 'SHP', 'MedQA-USMLE-4-options-hf', 'wikimedqa/medwiki', 'cicero', 'mutual', 'NeQA', 'quote-repetition', 'redefine-math', 'puzzte', 'implicatures', 'race-c', 'spartqa-yn', 'spartqa-mchoice', 'temporal-nli', 'riddle_sense', 'clcd-english', 'twentyquestions', 'reclor', 'counterfactually-augmented-imdb', 'counterfactually-augmented-snli', 'cnli', 'boolq-natural-perturbations', 'equate', 'ScienceQA_text_only', 'ekar_english', 'implicit-hate-stg1', 'logiqa-2.0-nli', 'PARARULE-Plus', 'mindgames', 'universal_dependencies/en_partut/deprel', 'universal_dependencies/en_lines/deprel', 'universal_dependencies/en_gum/deprel', 'universal_dependencies/en_ewt/deprel', 'ambient', 'path-naturalness-prediction', 'cloth', 'dgen', 'oasst1_pairwise_rlhf_reward', 'I2D2', 'args_me', 'Touche23-ValueEval', 'starcon', 'banking77', 'ruletaker', 'lsat_qa/all', 'ConTRoL-nli', 'tracie', 'sherliic', 'sen-making/1', 'sen-making/2', 'mbib-base/cognitive-bias', 'mbib-base/fake-news', 'mbib-base/gender-bias', 'mbib-base/hate-speech', 'mbib-base/linguistic-bias', 'mbib-base/political-bias', 'mbib-base/racial-bias', 'mbib-base/text-level-bias', 'robustLR', 'v1/gen_train234_test2to10', 'logical-fallacy', 'parade', 'cladder', 'subjectivity', 'MOH', 'VUAC', 'TroFi', 'sharc_modified/mod', 'conceptrules_v2', 'disrpt/eng.dep.scidtb', 'conll2000', 'few-nerd/supervised', 'zero-shot-label-nli', 'com2sense', 'scone', 'winodict', 'fool-me-twice', 'monli', 'corr2cause', 'apt', 'twitter-financial-news-sentiment', 'icl-symbol-tuning-instruct', 'SpaceNLI', 'propsegment/nli', 'HatemojiBuild', 'regset', 'esci', 'dnd_style_intents']
@sileod Thanks for the detailed response!
I'm using the FLAN 2022 dataset (https://huggingface.co/datasets/Open-Orca/FLAN). What is FLAN with SNI? Also, are these tasks listed not present in FLAN 2022 and Bigbench and MMLU?
Besides, I'm also interested in symbol tuning. My Discord is imonenext
, feel free to DM.
Thanks for the great work! I'm also using FLAN for training, so I'm wondering how to include only tasks that are in Tasksource but not in FLAN.