issues
search
tianyi-lab
/
Cherry_LLM
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models
287
stars
19
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
The training bash script for FastChat is what?
#24
daidaiershidi
opened
6 days ago
2
How to transform a json file into fastchat format?
#23
daidaiershidi
closed
1 week ago
1
The ifd score is affected by the prompt
#22
lihongxiacream
closed
1 month ago
1
why is the process so slow
#21
lihongxiacream
closed
2 months ago
7
Question about the effect of labels[0, :start_token] = -100
#20
lygjwy
closed
2 months ago
1
how many epochs to train on cherry data?
#19
menghonghan
closed
9 months ago
2
Evaluation reproducibility on benchmarks
#18
Cheungki
closed
9 months ago
7
关于Direct Answer Score sθ(A)
#17
DryPilgrim
closed
9 months ago
2
batch?
#16
sevenHQ
closed
9 months ago
1
'The training of pre-experienced models is discarded for more efficient usage': that means we can only use base model to do cherry analysis and selection?
#15
Labmem009
closed
10 months ago
1
Chinese SFT data cannot be displayed.
#14
JieDengsc
closed
10 months ago
3
Any report of time consuming?
#13
redreamality
closed
10 months ago
1
Could the Pre-Experienced Model be used in other different dataset?
#12
CNXDZS
closed
10 months ago
1
Questions related to training
#11
JieDengsc
closed
10 months ago
5
How to filter code SFT data?
#10
wyjksyjs
closed
10 months ago
2
GPT-4/ChatGPT Evaluation Code
#9
mshen2
closed
10 months ago
1
Multi-round conversation data set
#8
wuQi-666
closed
10 months ago
3
about the paper
#7
liwenju0
closed
10 months ago
1
May I ask if this project is suitable for other large models, such as the Baichuan model, to filter high-quality datasets from other fields
#6
wuQi-666
closed
10 months ago
4
I plan to apply this method on Llama2, which part of this project needs to be changed to adapt to Llama2?
#5
Labmem009
closed
11 months ago
1
Logic behind IFD score
#4
mshen2
closed
11 months ago
1
a confusion about data_by_IFD
#3
wangjingyeye
closed
11 months ago
3
a confusion about Instruction-Following Difficulty (IFD) scores
#2
xiaohuoguohh
closed
11 months ago
2
Need help: the loss curve is strange.
#1
ifshine
closed
1 year ago
3