tianyi-lab Cherry_LLM issues

tianyi-lab / Cherry_LLM

[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models

287 stars 19 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

The training bash script for FastChat is what?

#24 daidaiershidi opened 6 days ago
2
How to transform a json file into fastchat format?

#23 daidaiershidi closed 1 week ago
1
The ifd score is affected by the prompt

#22 lihongxiacream closed 1 month ago
1
why is the process so slow

#21 lihongxiacream closed 2 months ago
7
Question about the effect of labels[0, :start_token] = -100

#20 lygjwy closed 2 months ago
1
how many epochs to train on cherry data?

#19 menghonghan closed 9 months ago
2
Evaluation reproducibility on benchmarks

#18 Cheungki closed 9 months ago
7
关于Direct Answer Score sθ(A)

#17 DryPilgrim closed 9 months ago
2
batch?

#16 sevenHQ closed 9 months ago
1
'The training of pre-experienced models is discarded for more efficient usage': that means we can only use base model to do cherry analysis and selection?

#15 Labmem009 closed 10 months ago
1
Chinese SFT data cannot be displayed.

#14 JieDengsc closed 10 months ago
3
Any report of time consuming?

#13 redreamality closed 10 months ago
1
Could the Pre-Experienced Model be used in other different dataset?

#12 CNXDZS closed 10 months ago
1
Questions related to training

#11 JieDengsc closed 10 months ago
5
How to filter code SFT data？

#10 wyjksyjs closed 10 months ago
2
GPT-4/ChatGPT Evaluation Code

#9 mshen2 closed 10 months ago
1
Multi-round conversation data set

#8 wuQi-666 closed 10 months ago
3
about the paper

#7 liwenju0 closed 10 months ago
1
May I ask if this project is suitable for other large models, such as the Baichuan model, to filter high-quality datasets from other fields

#6 wuQi-666 closed 10 months ago
4
I plan to apply this method on Llama2, which part of this project needs to be changed to adapt to Llama2?

#5 Labmem009 closed 11 months ago
1
Logic behind IFD score

#4 mshen2 closed 11 months ago
1
a confusion about data_by_IFD

#3 wangjingyeye closed 11 months ago
3
a confusion about Instruction-Following Difficulty (IFD) scores

#2 xiaohuoguohh closed 11 months ago
2
Need help: the loss curve is strange.

#1 ifshine closed 1 year ago
3