yano0 commented 4 months ago

me5やe5-mistralなどの、タスクごとに異なるprefixやinstructを必要とするモデルを、簡単に評価できるとありがたいです。 instructは規定のものを利用することができるほか、jsonファイルなどでユーザーが作成した任意のものを利用できる形式だとより助かります。

lsz05 commented 4 months ago

ちょうど実装しています！

lsz05 commented 4 months ago

@yano0 san

30 で設定できるように実装しました。

jsonファイルはどういうイメージでしょうか？

yano0 commented 4 months ago

拝見します！ありがとうございますmm jsonはタスクごとに異なるinstructionを付与する用途で、以下のような形式をイメージしております。

{
    "amazon_counterfactual_classification": {
        "query": "",
        "doc": "Classify a given Amazon customer review text as either counterfactual or not-counterfactual."
    },
    "amazon_review_classification": {
        "query": "",
        "doc": "Classify the given Amazon review into its appropriate rating category."
    },
    "massive_intent_classification": {
        "query": "",
        "doc": "Given a user utterance as query, find the user intents."
    },
    "massive_scenario_classification": {
        "query": "",
        "doc": "Given a user utterance as query, find the user scenarios."
    },
    "jsick": {
        "query": "",
        "doc": "Retrieve semantically similar text."
    },
    "jsts": {
        "query": "",
        "doc": "Retrieve semantically similar text."
    },
    "paws_x_ja": {
        "query": "",
        "doc": "Retrieve parallel sentences."
    },
    "esci": {
        "query": "",
        "doc": "Find the best products from your e-commerce site search query."
    },
    "jaqket": {
        "query": "",
        "doc": "Given a quiz, retrieve for reference documents."
    },
    "mrtydi": {
        "query": "",
        "doc": "Given a question, retrieve Wikipedia passages that answer the question."
    },
    "nlp_journal_title_abs": {
        "query": "",
        "doc": "Given a title of paper, retrieve for an abstract of the paper."
    },
    "nlp_journal_title_intro": {
        "query": "",
        "doc": "Given a title of paper, retrieve for an introduction of the paper."
    },
    "nlp_journal_abs_intro": {
        "query": "",
        "doc": "Given a abstract of paper, retrieve for an introduction of the paper."
    },
    "jagovfaqs_22k": {
        "query": "",
        "doc": "Given a question, search for the answer."
    },
    "livedoor_news": {
        "query": "",
        "doc": "Identify the category of the news articles."
    },
    "mewsc16": {
        "query": "",
        "doc": "Identify the category of the news titles."
    }
}

lsz05 commented 4 months ago

ありがとうございます。 prefixはデータセットの属性として，データセットごと設定するものですので，グローバルでprefixをまとめて設定できるjsonは想定しおらず，各データセットの設定ファイルに書き込むのを想定しています。例えばこちらのinit_argsの下に，"prefix": "Classify a given Amazon customer review text as either counterfactual or not-counterfactual."という使え方を想定しています。

yano0 commented 4 months ago

ありがとうございます！早速利用しましたが、タスクごとにprefixに当たる引数名が異なるのが若干大変だなと感じました。早々と対応いただきありがとうございました！

lsz05 commented 4 months ago

@yano0 san ご指定の仕様にはなっていないのですが，本件クローズさせていただいてもよろしいでしょうか?

sbintuitions / JMTEB

prefixやinstructへの対応 #29

30 で設定できるように実装しました。