issues
search
stanford-crfm
/
mistral
Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging Face 🤗 Transformers.
Apache License 2.0
562
stars
49
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Port NaNDetector
#150
dlwh
opened
2 years ago
0
tutorials
#149
J38
closed
2 years ago
0
fix remaining pre-commit issues
#148
dlwh
closed
2 years ago
2
Add some Mistral Tutorials
#147
dlwh
closed
2 years ago
1
update Mistral checkpoints so main branch points to checkpoint-400000
#146
dlwh
closed
2 years ago
0
Get CI testing 2 gpu config
#145
dlwh
closed
2 years ago
0
Dev
#144
J38
closed
2 years ago
0
please pre-commit gods
#143
dlwh
closed
2 years ago
2
Dev
#142
J38
closed
2 years ago
0
Node arguments not parsed properly for torch.distributed.launch
#141
YianZhang
closed
2 years ago
1
make CI keep up with dependencies, discover tests automatically, validate configs
#140
dlwh
closed
2 years ago
1
make a more efficient IndexedDataset data store for storing tokenized datasets
#139
dlwh
closed
2 years ago
1
Fork Preprocessing when doing multiple gpus
#138
dlwh
closed
2 years ago
4
add pre-commit action, run pre-commit
#137
dlwh
closed
2 years ago
8
broaden the values we accept to disable wandb
#136
dlwh
closed
2 years ago
0
fix typo: shorter --> longer
#135
dlwh
closed
2 years ago
0
Make sure we don't write anything (of significant size) to /tmp
#134
dlwh
closed
2 years ago
2
WIP switch from quinine to yahp
#133
dlwh
opened
2 years ago
3
Local tests
#132
dlwh
closed
2 years ago
0
get_auto_dataset path logic does not work properly when dataset_id is a path
#131
dtsip
closed
2 years ago
8
Switch to a supported config library
#130
dlwh
opened
2 years ago
1
Update README and checkpoint info json
#129
J38
closed
2 years ago
0
Update Main
#128
J38
closed
2 years ago
0
WIP Support dataset streaming
#127
dlwh
closed
2 years ago
2
Streaming data for larger datasets
#126
jthickstun
closed
2 years ago
7
Remove override of _maybe_log_save_evaluate
#125
dlwh
closed
2 years ago
0
remove activation logging since it's a dead code path
#124
dlwh
closed
2 years ago
0
update documentation on accessing new Mistral checkpoints on HF Hub
#123
dlwh
closed
2 years ago
1
revisit OnlineBenchmarkTrainer
#122
dlwh
closed
2 years ago
1
Eval dataset is hard coded to be "openwebtext_ppl"
#121
dlwh
closed
2 years ago
0
Update/pin some dependency versions for release
#120
dlwh
closed
2 years ago
0
Requirements.txt vs conda env yamls
#119
dlwh
closed
2 years ago
11
Generate Model Cards for models
#118
dlwh
closed
2 years ago
1
Tokenization crashes when using deepspeed
#117
dlwh
closed
2 years ago
0
Streaming and Sharded Data Loading
#116
dlwh
opened
2 years ago
0
Investigate loss spikes
#115
dlwh
opened
2 years ago
0
remove old _get_train_sampler since the relevant changes have been upstreamed
#114
dlwh
closed
2 years ago
7
Add Tutorial On How To Restart From A Checkpoint
#113
J38
closed
2 years ago
0
Make local dataset configurable
#112
teetone
closed
2 years ago
1
Finalize loading local data from .jsonl files and facilitating data blends
#111
J38
closed
2 years ago
29
[Low Priority/Optional] allow export to support more than two archs
#110
dlwh
closed
2 years ago
1
[RFC] Mistral v2.0 Roadmap
#109
J38
closed
2 years ago
12
move gradient_checkpointing over to training_arguments since it's now…
#108
dlwh
closed
2 years ago
6
add defaults to cerberos schema to help keep wikitext-103 working.
#107
dlwh
closed
2 years ago
0
Dependency Conflict: huggingface-hub=0.0.2
#106
carlini
closed
2 years ago
2
Add BERT
#105
michiyasunaga
closed
2 years ago
0
Some changes to support batch jobs
#104
teetone
closed
2 years ago
0
[Installation] Resolving dependency chain due to the latest Transformer version
#103
sameeravithana
closed
3 years ago
12
Update deepspeed version
#102
J38
closed
3 years ago
0
set train_batch_size to auto
#101
J38
closed
3 years ago
0
Previous
Next