so not only is the path doubled. It's also only half-formatted.
Environment
System Environment Report
Created: 2024-04-03 10:25:17 CEST
PyTorch information
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.27.0
Libc version: glibc-2.35
Python version: 3.10.12 (main, Jul 5 2023, 18:54:27) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.5.0-26-generic-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 7 PRO 4750U with Radeon Graphics
CPU family: 23
Model: 96
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 1
Frequency boost: enabled
CPU max MHz: 1700,0000
CPU min MHz: 1400,0000
BogoMIPS: 3393.76
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca
Virtualization: AMD-V
L1d cache: 256 KiB (8 instances)
L1i cache: 256 KiB (8 instances)
L2 cache: 4 MiB (8 instances)
L3 cache: 8 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Composer version: 0.21.1
Composer commit hash: None
Host processor model name: AMD Ryzen 7 PRO 4750U with Radeon Graphics
Host processor core count: 8
Number of nodes: 1
Accelerator model name: N/A
Accelerators per node: 0
CUDA Device Count: 0
** To reproduce
Steps to reproduce the behavior:
Instantiate a logger (here used NeptuneLogger) and pass it to the Trainer instance
Pass an instance of MemorySnapshot with default constructor to the Trainer instance
Train
The file name is messed-up (see the top of this issue description) - this doesn't seem to be a problem with the NeptuneLogger.
Expected behavior
The file name should not have duplicated parts and should have proper string formatting applied.
This line seems to be the issue in
MemorySnapshot
:remote_file_name = (self.remote_path_in_bucket + os.path.basename(f)).lstrip('/')
where the respective variables evaluate to e.g.
so not only is the path doubled. It's also only half-formatted.
Environment
System Environment Report
Created: 2024-04-03 10:25:17 CEST
PyTorch information
PyTorch version: 2.1.2+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.27.0 Libc version: glibc-2.35
Python version: 3.10.12 (main, Jul 5 2023, 18:54:27) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-6.5.0-26-generic-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: AuthenticAMD Model name: AMD Ryzen 7 PRO 4750U with Radeon Graphics CPU family: 23 Model: 96 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 Stepping: 1 Frequency boost: enabled CPU max MHz: 1700,0000 CPU min MHz: 1400,0000 BogoMIPS: 3393.76 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca Virtualization: AMD-V L1d cache: 256 KiB (8 instances) L1i cache: 256 KiB (8 instances) L2 cache: 4 MiB (8 instances) L3 cache: 8 MiB (2 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection Vulnerability Spec rstack overflow: Mitigation; Safe RET Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected
Versions of relevant libraries: [pip3] numpy==1.25.1 [pip3] pytorch-ranger==0.1.1 [pip3] torch==2.1.2 [pip3] torch-optimizer==0.3.0 [pip3] torchdata==0.7.1 [pip3] torchmetrics==1.0.1 [pip3] torchtext==0.16.2 [pip3] torchvision==0.16.2 [pip3] triton==2.1.0 [pip3] vit-pytorch==0.35.8 [conda] numpy 1.25.1 pypi_0 pypi [conda] pytorch-ranger 0.1.1 pypi_0 pypi [conda] torch 2.1.2 pypi_0 pypi [conda] torch-optimizer 0.3.0 pypi_0 pypi [conda] torchdata 0.7.1 pypi_0 pypi [conda] torchmetrics 1.0.1 pypi_0 pypi [conda] torchtext 0.16.2 pypi_0 pypi [conda] torchvision 0.16.2 pypi_0 pypi [conda] triton 2.1.0 pypi_0 pypi [conda] vit-pytorch 0.35.8 pypi_0 pypi
Composer information
Composer version: 0.21.1 Composer commit hash: None Host processor model name: AMD Ryzen 7 PRO 4750U with Radeon Graphics Host processor core count: 8 Number of nodes: 1 Accelerator model name: N/A Accelerators per node: 0 CUDA Device Count: 0
** To reproduce
Steps to reproduce the behavior:
NeptuneLogger
) and pass it to theTrainer
instanceMemorySnapshot
with default constructor to theTrainer
instanceThe file name is messed-up (see the top of this issue description) - this doesn't seem to be a problem with the
NeptuneLogger
.Expected behavior
The file name should not have duplicated parts and should have proper string formatting applied.
Additional context