Closed LWprogramming closed 1 year ago
and a list of packages in case it ends up being relevant:
pip list
Package Version
------------------------ ----------
accelerate 0.20.3
antlr4-python3-runtime 4.8
audiolm-pytorch 1.2.1
beartype 0.14.1
bitarray 2.7.6
blessed 1.20.0
certifi 2023.5.7
cffi 1.15.1
charset-normalizer 3.1.0
cmake 3.26.4
colorama 0.4.6
Cython 0.29.35
einops 0.6.1
ema-pytorch 0.2.3
encodec 0.1.1
fairseq 0.12.2
filelock 3.12.2
fsspec 2023.6.0
gpustat 1.1
huggingface-hub 0.15.1
hydra-core 1.0.7
idna 3.4
Jinja2 3.1.2
joblib 1.2.0
lion-pytorch 0.1.2
lit 16.0.6
local-attention 1.8.6
lxml 4.9.2
MarkupSafe 2.1.3
mpmath 1.3.0
networkx 3.1
numpy 1.25.0
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-ml-py 11.525.112
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
omegaconf 2.0.6
packaging 23.1
pip 23.0.1
portalocker 2.7.0
protobuf 4.23.3
psutil 5.9.5
pycparser 2.21
PyYAML 6.0
regex 2023.6.3
requests 2.31.0
sacrebleu 2.3.1
safetensors 0.3.1
scikit-learn 0.24.0
scipy 1.11.0
sentencepiece 0.1.99
setuptools 65.5.0
six 1.16.0
sympy 1.12
tabulate 0.9.0
tensorboardX 2.6.1
threadpoolctl 3.1.0
tokenizers 0.13.3
torch 2.0.1
torchaudio 2.0.2
tqdm 4.65.0
transformers 4.30.2
triton 2.0.0
typing_extensions 4.6.3
urllib3 2.0.3
vector-quantize-pytorch 1.6.24
wcwidth 0.2.6
wheel 0.40.0
I'm not quite sure when exactly this issue came up, but my first guess is that it might've been in some dependency a few versions back somewhere. Will keep this updated
@LWprogramming i think it has to do with scikit-learn
i may try to redo the kmeans logic in pytorch (or find a suitable library as substitute) in the future
scikit-learn is too hefty a dep for such simple logic
actually, since we aren't training the hubert kmeans, it should be straightforward to remove scikit-learn. just need to extract the cluster centers from wherever it is stored
@LWprogramming error is gone! :laughing:
Hm that's odd, I'm still seeing the warning messages. Are you getting just the loss reported like this?
0: loss: 6.498922824859619
0: valid loss 0.8049952983856201
0: saving model to folder
1: loss: ... etc
hmm, i'm no longer seeing Computing label assignment and total inertia
are we referring to the same warning message?
@LWprogramming can you do a pip list | grep audiolm-pytorch
and make sure it is at 1.2.11? i just double checked by reverting, and the warning message reappeared
@LWprogramming oh i see, the error messages are unrelated
well, you can train without worrying, as we aren't using scikit learn at all other than extracting the cluster centers
When training, I get this error in the logs, so I'm making an issue here in case anyone else has seen this come up.
OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option.
I'm not sure what this warning is supposed to mean, but it seems like things are still training. I also see the output look something like this, although I'm not sure if it's related:
with many repetitions of that phrase.
Replication:
audiolm_pytorch_demo_laion.py
sbatch sbatch.sh