issues
search
sangmichaelxie
/
doremi
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
https://arxiv.org/abs/2305.10429
MIT License
275
stars
32
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
AssertionError:assert q.dtype in [torch.float16, torch.bfloat16]
#31
Richard-Wth
closed
6 days ago
2
Question about model initialization
#30
MAxx8371
opened
1 month ago
0
program stuck (when ”Loading cached shuffled indices for dataset at ...“)
#29
ccx06
opened
3 months ago
3
Question about 8B model architecture
#28
Qinghao-Hu
closed
1 week ago
1
Cuda version problem
#27
RRaphaell
opened
5 months ago
2
Question about the initialization of the perdomain_scores
#26
yuzc19
closed
1 week ago
1
Questions about the loss used for optimizing the proxy model
#25
clarkkent0618
opened
6 months ago
3
Speed decrease during training
#24
ljb121002
opened
6 months ago
1
Questions about directly applying the weights from paper or the repo to train main model
#23
clarkkent0618
opened
6 months ago
2
ModuleNotFoundError: No module named 'flash_attn.models.falcon'
#22
Sniper970119
opened
6 months ago
11
Edge Case Discussion
#21
thangld201
closed
6 months ago
1
Cannot reproduce the results shown in Github repo with the 120M reference model on A800 (8*80G).
#20
kiseliu
opened
7 months ago
17
List of pinned requirements / Dockerfile?
#19
filipg7777
closed
6 months ago
2
Question about optimized weights in the paper
#18
yuzc19
closed
7 months ago
4
Training time for baseline model and proxy model
#17
yuzc19
closed
7 months ago
1
Jpz doremi
#16
2003pro
closed
9 months ago
0
How many rounds do we need to converge domain weights on The Pile?
#15
ouyangliqi
opened
9 months ago
1
Fix typo in README.md
#14
eltociear
closed
10 months ago
0
How do you get the model to be good at code if it downsamples code?
#13
teknium1
opened
10 months ago
1
Question about Flash-attention version.
#12
kiseliu
closed
10 months ago
1
Should reference model initialize weights uniformly?
#11
ouyangliqi
closed
10 months ago
3
easy HF dataset doremi?
#10
brando90
opened
11 months ago
2
loss computation wrong?
#9
tt6746690
closed
10 months ago
2
question about only updating the domain weights on process 0
#8
SueJane
closed
7 months ago
4
question about domain weights initialization value in paper fingure 8
#7
Haijunlv
closed
11 months ago
1
Multi-nodes support
#6
binxuan
opened
12 months ago
1
Domain weights are mostly near one-hot
#5
xiamengzhou
closed
11 months ago
3
Update filter_domains.py
#4
Zhihong-Zhu
closed
1 year ago
0
about loss
#3
ywb2018
closed
1 year ago
1
Adding a license
#2
virtualzx-nad
closed
1 year ago
1
step 1 baseline_280M loss large
#1
gawei1995
closed
1 year ago
5