issues
search
songmzhang
/
DSKD
Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models".
36
stars
4
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Is vanilla KD for same vocab equivalent to Minimum Edit Distance for different vocab?
#22
survivebycoding
opened
2 days ago
3
The code works only with dev and train set, and not with test set. Right?
#21
survivebycoding
closed
1 day ago
1
Files for token mapping
#20
ntsw2001
closed
1 month ago
2
Quantify difference in vocabulary
#19
srikhetramohanty
opened
1 month ago
5
Failed to reproduce KD results
#18
cpsu00
closed
1 month ago
4
GPT2-1.5B Pretrained Teacher on Dolly
#17
cpsu00
closed
2 months ago
2
Is the Tinyllama in the description a base model or pretrained model?
#16
survivebycoding
closed
1 day ago
1
Reproduction of results
#15
mathamateur
opened
2 months ago
9
using mistral from
#14
survivebycoding
closed
2 months ago
6
load 72B teacher model
#13
ypw-lbj
opened
2 months ago
6
Evaluation script error with TinyLlama
#12
srikhetramohanty
closed
2 months ago
2
qwen
#11
zjjznw123
opened
2 months ago
3
Concern regarding performance
#10
survivebycoding
closed
2 months ago
15
Running inference using evaluation scripts
#9
srikhetramohanty
closed
2 months ago
2
Getting an error when trying to perform SFT on Tiny Llama
#8
survivebycoding
closed
3 months ago
10
More desctipion on output folder created
#7
survivebycoding
closed
3 months ago
1
Can we use this code for CPU?
#6
survivebycoding
closed
3 months ago
1
need LLama .bin file instead of .pth file
#5
survivebycoding
closed
3 months ago
2
From where should we downloads the models?
#4
survivebycoding
closed
3 months ago
4
Usage with other model combinations
#3
botox-100
closed
3 months ago
4
About SeqKD with different vocabularies
#2
2018cx
closed
4 months ago
3
关于 AKL 的计算
#1
wutaiqiang
closed
4 months ago
1