namespace-Pt / UltraGist

MIT License
15 stars 2 forks source link

Only FT #3

Open chenchenchen77 opened 3 months ago

chenchenchen77 commented 3 months ago

您好,我想请问一下如果只微调而不使用预训练的话,对应的命令是直接运行ft的就可以了吗,似乎直接ft效果不是很好,或者是我代码运行哪里有问题吗,谢谢您!

namespace-Pt commented 3 months ago

hi,仅ft效果就是比较差。论文里应该有对应的ablation

chenchenchen77 commented 3 months ago

92be664f579f6030acef780da7af61b 看您论文ablation的MSC分数似乎差距不大,但是目前复现出来的qa等数据集似乎效果很不好,请问这些数据集是不是还是要先预训练和ft之后才有您论文提到的结果

namespace-Pt commented 3 months ago

Hi, 是的, 最好是先pretrain再ft, 用我发的命令训完得到的效果应该很接近

chenchenchen77 commented 3 months ago

您好!我已经按照您给的命令训练并推理了一遍。 1.topic的结果是符合预期的,但是Long-Context Tasks结果比较低,想请问一下您是否知道原因(接下来我准备download您提供的模型推理一遍看是训练问题还是推理问题),以下是我复现出来的Long-Context Tasks结果: Category_Metrics: {"EN Single-Doc QA": 20.64, "EN Multi-Doc QA": 13.12, "EN Summarization": 16.27, "EN Few-Shot Learning": 37.14, "Code Completion": 36.5} Metrics : {"narrativeqa": 2.22, "qasper": 29.34, "multifieldqa_en": 30.35, "hotpotqa": 12.62, "2wikimqa": 24.48, "musique": 2.26, "gov_report": 16.57, "qmsum": 8.37, "multi_news": 23.86, "trec": 53.0, "triviaqa": 35.37, "samsum": 23.05, "lcc": 45.89, "repobench-p": 27.11, "avg": 23.89}

  1. MSC任务复现的结果如下,跟您论文提到的分数好像区别有点大,请问这可能是什么问题呢? Metrics : {"rouge": 0.6970748331733622} 3.顺便说一下,复现Needle-In-A-Haystack任务会有如下报错 Traceback (most recent call last): File "/data/miniconda3/envs/ultragist/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/data/miniconda3/envs/ultragist/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/data/UltraGist/main/eval_needle.py", line 406, in main() File "/data/miniconda3/envs/ultragist/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/UltraGist/main/eval_needle.py", line 219, in main args: Args = parser.parse_args_into_dataclasses()[0] File "/data/miniconda3/envs/ultragist/lib/python3.10/site-packages/transformers/hf_argparser.py", line 347, in parse_args_into_dataclasses raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}") ValueError: Some specified arguments are not used by the HfArgumentParser: ['--beacon_ratio_mix', 'adapt-1024']

再次感谢您耐心的解答!

crazyofapple commented 1 month ago

恩,我也遇到这个问题了

namespace-Pt commented 1 month ago

hi, 命令应该改成--ultragist_ratio_mix adapt-1024, 我已经修改文档. 以及, 推荐用这里的代码训练, 支持 FA2, 速度更快, 效果更好, https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon/new

@chenchenchen77 已经加了我微信帮他解决了, 如果你有效果复现的问题也可以加我微信 namespace-Pt