issues
search
philschmid
/
deep-learning-pytorch-huggingface
MIT License
570
stars
133
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Quantization question:
#56
aptum11
opened
2 weeks ago
0
Not able to run training/fsdp-qlora-distributed-llama3.ipynb
#55
aasthavar
closed
2 weeks ago
7
St
#54
philschmid
closed
1 month ago
0
Clean up some typos; simplify some code; update some comments
#53
tomaarsen
closed
1 month ago
0
Deprecation warnings.
#52
hohoCode
opened
1 month ago
0
Fine-tune-llm-in-2024-with-trl.ipynb not producing the outputs
#51
scigeek72
opened
1 month ago
0
Fsdp qlora
#50
philschmid
closed
2 months ago
0
Out of Memory: Cannot reproduce T5-XXL run on 8xA10G.
#49
slai-natanijel
opened
3 months ago
3
What's the use of "messages" in dpo step?
#48
katopz
opened
3 months ago
0
question about DeepSpeedPeftCallback
#47
mickeysun0104
opened
4 months ago
0
Gemma
#46
philschmid
closed
4 months ago
0
Re. fine-tune-llms-in-2024-with-trl.ipynb
#45
andysingal
opened
4 months ago
1
Dpo
#44
philschmid
closed
4 months ago
0
Target modules all-linear not found in the base model.
#43
kassemsabeh
closed
5 months ago
6
Commit Version Bug Fix
#42
YanSte
closed
5 months ago
1
Trl
#41
philschmid
closed
5 months ago
0
flash attention error on instruction tune llama-2 tutorial on Sagemaker notebook
#40
matthewchung74
opened
8 months ago
2
Precision Issue
#39
zihaohe123
opened
9 months ago
4
Falcon-180B "forward() got an unexpected keyword argument 'position_ids'"
#38
aittalam
opened
9 months ago
0
Does this work for Llama2 - Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention?
#37
ibicdev
opened
9 months ago
11
Ds lora
#36
philschmid
closed
9 months ago
0
Instruction tuning of LLama2 is significantly slower compared to documented 3 hours fine-tuning time on A10G.
#35
mlscientist2
opened
9 months ago
1
Compute metrics while using SFT trainer
#34
shubhamagarwal92
opened
9 months ago
1
Cannot load tokenizer for llama2
#33
smreddy05
closed
9 months ago
1
LLama 2 Flash Attention Patch Not Working For 70B
#32
mallorbc
opened
10 months ago
6
Gptq
#31
philschmid
closed
10 months ago
0
Fix Flash Attention forward for Llama-2 70b
#30
davidmrau
closed
10 months ago
8
Is the DataCollator necessary in peft-flan-t5-int8-summarization.ipynb ?
#29
brooksbp
opened
10 months ago
0
question about the llama instruction code
#28
yeontaek
closed
10 months ago
8
improvements
#27
philschmid
closed
11 months ago
0
Llama patch for FlashAttention support fails with use_cache
#26
qmdnls
opened
11 months ago
2
How to create a json file for create_flan_t5_cnn_dataset.py
#25
andysingal
opened
11 months ago
1
gcc/cuda used for training
#24
danyaljj
opened
11 months ago
1
fix
#23
philschmid
closed
11 months ago
0
Flash attention
#22
philschmid
closed
11 months ago
4
Falcon int4
#21
philschmid
closed
11 months ago
0
added container image for training
#20
philschmid
closed
1 year ago
0
CPU offload when not using offload deepspeed config file
#19
siddharthvaria
opened
1 year ago
3
Error when training peft model example
#18
Tachyon5
opened
1 year ago
6
Colab notebook fails
#17
TzurV
closed
1 year ago
1
CUDA OOM error while saving the model
#16
aasthavar
closed
1 year ago
10
Does deepspeed partition the model to multi GPUs?
#15
vikki7777
opened
1 year ago
4
ValueError
#14
Martok10
opened
1 year ago
4
Inference on CNN validation set takes 2+ hours on p4dn.24xlarge machine with 8 A100s, 40GB each
#13
sverneka
opened
1 year ago
5
FLAN-T5 XXL using DeepSpeed fits well for training but gives OOM error for inference.
#12
irshadbhat
opened
1 year ago
2
Sample inference script for FLAN-T5 XXL using DeepSpeed & Hugging Face.
#11
irshadbhat
closed
1 year ago
7
Error when finetuning Flan-T5-XXL on custom dataset
#10
ngun7
opened
1 year ago
1
Peft flan
#9
philschmid
closed
1 year ago
0
fix small bugs of deepseed-flan-t5-summarization.ipynb
#8
yao-matrix
closed
1 year ago
2
Error (return code -7) when finetuning FLANT5-xxl on 8* A100
#7
scofield7419
opened
1 year ago
3
Next