sangmichaelxie doremi issues

sangmichaelxie / doremi

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets

https://arxiv.org/abs/2305.10429

MIT License

275 stars 32 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

AssertionError：assert q.dtype in [torch.float16, torch.bfloat16]

#31 Richard-Wth closed 6 days ago
2
Question about model initialization

#30 MAxx8371 opened 1 month ago
0
program stuck (when ”Loading cached shuffled indices for dataset at ...“)

#29 ccx06 opened 3 months ago
3
Question about 8B model architecture

#28 Qinghao-Hu closed 1 week ago
1
Cuda version problem

#27 RRaphaell opened 5 months ago
2
Question about the initialization of the perdomain_scores

#26 yuzc19 closed 1 week ago
1
Questions about the loss used for optimizing the proxy model

#25 clarkkent0618 opened 6 months ago
3
Speed decrease during training

#24 ljb121002 opened 6 months ago
1
Questions about directly applying the weights from paper or the repo to train main model

#23 clarkkent0618 opened 6 months ago
2
ModuleNotFoundError: No module named 'flash_attn.models.falcon'

#22 Sniper970119 opened 6 months ago
11
Edge Case Discussion

#21 thangld201 closed 6 months ago
1
Cannot reproduce the results shown in Github repo with the 120M reference model on A800 (8*80G).

#20 kiseliu opened 7 months ago
17
List of pinned requirements / Dockerfile?

#19 filipg7777 closed 6 months ago
2
Question about optimized weights in the paper

#18 yuzc19 closed 7 months ago
4
Training time for baseline model and proxy model

#17 yuzc19 closed 7 months ago
1
Jpz doremi

#16 2003pro closed 9 months ago
0
How many rounds do we need to converge domain weights on The Pile?

#15 ouyangliqi opened 9 months ago
1
Fix typo in README.md

#14 eltociear closed 10 months ago
0
How do you get the model to be good at code if it downsamples code?

#13 teknium1 opened 10 months ago
1
Question about Flash-attention version.

#12 kiseliu closed 10 months ago
1
Should reference model initialize weights uniformly?

#11 ouyangliqi closed 10 months ago
3
easy HF dataset doremi?

#10 brando90 opened 11 months ago
2
loss computation wrong?

#9 tt6746690 closed 10 months ago
2
question about only updating the domain weights on process 0

#8 SueJane closed 7 months ago
4
question about domain weights initialization value in paper fingure 8

#7 Haijunlv closed 11 months ago
1
Multi-nodes support

#6 binxuan opened 12 months ago
1
Domain weights are mostly near one-hot

#5 xiamengzhou closed 11 months ago
3
Update filter_domains.py

#4 Zhihong-Zhu closed 1 year ago
0
about loss

#3 ywb2018 closed 1 year ago
1
Adding a license

#2 virtualzx-nad closed 1 year ago
1
step 1 baseline_280M loss large

#1 gawei1995 closed 1 year ago
5