Open yukiarimo opened 2 months ago
Same problem but in Ubuntu
same problem inside docker image
It was a few days ago but I believe I had the some, windows 11 python 3.10. I was able to infer on the pretrained weights, but after spending hours tweaking the training script I was ably able to do 1 iteration in torchrun and basically nothing happened. Would love to be able to fine-tune...
same issue ubuntu python 3.9
π Describe the bug
I ran the following code:
My all.list example:
Log after running the
train.sh
:How to fix this? Am I doing something wrong?
Versions
Collecting environment information...
Model name: Intel(R) Xeon(R) CPU @ 2.00GHz CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 3 BogoMIPS: 4000.35 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities Hypervisor vendor: KVM Virtualization type: full L1d cache: 128 KiB (4 instances) L1i cache: 128 KiB (4 instances) L2 cache: 4 MiB (4 instances) L3 cache: 38.5 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Vulnerable; SMT Host state unknown Vulnerability Meltdown: Vulnerable Vulnerability Mmio stale data: Vulnerable Vulnerability Retbleed: Vulnerable Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Vulnerable
Versions of relevant libraries: [pip3] numpy==1.25.2 [pip3] torch==1.13.1 [pip3] torchaudio==0.13.1 [pip3] torchdata==0.7.1 [pip3] torchsummary==1.5.1 [pip3] torchtext==0.17.1 [pip3] torchvision==0.17.1+cu121 [pip3] triton==2.2.0 [conda] Could not collect