kingoflolz mesh-transformer-jax issues

kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Apache License 2.0

6.29k stars 892 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Colab version now breaks on "import optax"

#162 mesotron closed 2 years ago
1
sample data configuration for finetuning

#160 AnassKartit closed 2 years ago
1
Execute the model in a local machine (or WSL)

#159 Dr-NULL closed 2 years ago
1
How to run v3-128?

#158 soneo1127 closed 2 years ago
2
Freeze Transformer Weight

#157 ivokun closed 2 years ago
3
Finetuning and training minimum requirements

#156 AnassKartit closed 2 years ago
1
limitation min_length=1024

#155 ghost closed 2 years ago
1
Save failed during checkpoint saving function call

#154 VishalSharmavj opened 2 years ago
4
Please help with fine-tuning small dataset

#153 ilyakar closed 2 years ago
2
Access denied for gs://neo-datasets/openwebtext2_new_inputs/eval/openwebtext_9_7_100000.tfrecords

#152 elsakra closed 2 years ago
0
Update requirements.txt

#151 reouno closed 2 years ago
1
Colab error with latest requirements.txt

#150 lightyrs closed 2 years ago
2
Ok so here we have a Writer model an the co-pilot model if im right ? these two can sure be used to build a tool that guide ppl learning code ??

#149 Swoop376 closed 2 years ago
0
Regarding Fine-tuning.

#148 BakingBrains closed 2 years ago
2
Formatting and Problems while creating tfrecords file for fine tuning gpt. 0 byte tfrecords getting created.

#147 VishalSharmavj closed 2 years ago
3
Unclear how to best build and run the infrastructure

#146 jleacox closed 2 years ago
2
Colab demo does not work

#145 JialuZhang closed 2 years ago
3
lm-evaluation-harness dep bump breaks build

#144 kremlin- closed 2 years ago
4
unable to extract step_383500.tar.zstd.

#143 davzeng closed 2 years ago
3
need guide with docker, not an issue

#142 noman00910 closed 2 years ago
4
checkpoint saving args broken/unused

#141 ablacklama closed 1 year ago
1
added explicit support for tfrecord creation from single file.

#140 ablacklama closed 3 years ago
1
problem in downloading the slim weights

#139 whoislimshady closed 3 years ago
5
Add Megatron-Turing NLG 530B numbers

#138 djoldman closed 2 years ago
0
Google api Exception while finetuning model.

#137 Aryagm closed 3 years ago
1
create_finetune_tfrecords.py getting killed prematurely.

#136 Aryagm closed 3 years ago
2
how to print-debugging inside model

#135 jiasenlu closed 3 years ago
1
How to launch the train.py

#134 jiasenlu closed 3 years ago
1
Improvement in accuracy

#133 paramjeet2021 closed 3 years ago
1
more than 1024 tokens

#132 Wajih88 closed 3 years ago
1
Fixing config saving bug in to_hf_weights.py + adding pathy to requirements

#131 ablacklama closed 3 years ago
1
Can't write config of converted hf weights to gs bucket

#130 kevinpl07 closed 3 years ago
2
cannot exit recursive infinite loop in tfrecord_loader.py

#129 bvelker closed 3 years ago
3
Getting training and validation accuracy while training

#128 albertqjiang closed 3 years ago
1
Difference between the inputs to GPT-J6B and GPT-2?

#127 BakingBrains closed 3 years ago
1
Lower memory consumption in Colab demo

#126 vfbd closed 3 years ago
1
About text generation from keywords

#125 Wajih88 closed 3 years ago
1
Incompatible checkpoints (1,) vs (1, 4096)

#124 niyoushanajmaei closed 3 years ago
1
Regarding the finetuning of GPT-J6B.

#123 BakingBrains closed 3 years ago
2
Clarify val_batches

#122 nostalgebraist closed 3 years ago
1
Add TOC to readme

#121 rozanecm closed 3 years ago
1
[Colab] Your session crashed after using all available RAM

#120 1234igor closed 3 years ago
2
Different results while using the web model vs gpt-j-slim weights model on GPU

#119 msakthiganesh closed 3 years ago
1
Prevent generating excess tokens

#118 msakthiganesh closed 3 years ago
1
Can I use this script to convert my data to tfrecords?

#117 Aryagm closed 3 years ago
2
Update howto_finetune.md

#116 StellaAthena closed 3 years ago
1
Inference speed on TPU

#115 gamcoh closed 3 years ago
1
Style transfer

#114 jb33k closed 3 years ago
1
Non-Deterministic Output

#113 jerrygreen closed 3 years ago
1
Fine tune sequence length

#112 Alexmhack closed 3 years ago
1

Previous Next