lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.33k stars 249 forks source link

OpenBLAS/OpenMP Loop error message #203

Closed LWprogramming closed 1 year ago

LWprogramming commented 1 year ago

When training, I get this error in the logs, so I'm making an issue here in case anyone else has seen this come up.

OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option.

I'm not sure what this warning is supposed to mean, but it seems like things are still training. I also see the output look something like this, although I'm not sure if it's related:

11: loss: 2.9820859730243683
Computing label assignment and total inertia
Computing label assignment and total inertia
Computing label assignment and total inertia
Computing label assignment and total inertia

with many repetitions of that phrase.


Replication:

LWprogramming commented 1 year ago

and a list of packages in case it ends up being relevant:

pip list
Package                  Version
------------------------ ----------
accelerate               0.20.3
antlr4-python3-runtime   4.8
audiolm-pytorch          1.2.1
beartype                 0.14.1
bitarray                 2.7.6
blessed                  1.20.0
certifi                  2023.5.7
cffi                     1.15.1
charset-normalizer       3.1.0
cmake                    3.26.4
colorama                 0.4.6
Cython                   0.29.35
einops                   0.6.1
ema-pytorch              0.2.3
encodec                  0.1.1
fairseq                  0.12.2
filelock                 3.12.2
fsspec                   2023.6.0
gpustat                  1.1
huggingface-hub          0.15.1
hydra-core               1.0.7
idna                     3.4
Jinja2                   3.1.2
joblib                   1.2.0
lion-pytorch             0.1.2
lit                      16.0.6
local-attention          1.8.6
lxml                     4.9.2
MarkupSafe               2.1.3
mpmath                   1.3.0
networkx                 3.1
numpy                    1.25.0
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-ml-py             11.525.112
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
omegaconf                2.0.6
packaging                23.1
pip                      23.0.1
portalocker              2.7.0
protobuf                 4.23.3
psutil                   5.9.5
pycparser                2.21
PyYAML                   6.0
regex                    2023.6.3
requests                 2.31.0
sacrebleu                2.3.1
safetensors              0.3.1
scikit-learn             0.24.0
scipy                    1.11.0
sentencepiece            0.1.99
setuptools               65.5.0
six                      1.16.0
sympy                    1.12
tabulate                 0.9.0
tensorboardX             2.6.1
threadpoolctl            3.1.0
tokenizers               0.13.3
torch                    2.0.1
torchaudio               2.0.2
tqdm                     4.65.0
transformers             4.30.2
triton                   2.0.0
typing_extensions        4.6.3
urllib3                  2.0.3
vector-quantize-pytorch  1.6.24
wcwidth                  0.2.6
wheel                    0.40.0

I'm not quite sure when exactly this issue came up, but my first guess is that it might've been in some dependency a few versions back somewhere. Will keep this updated

lucidrains commented 1 year ago

@LWprogramming i think it has to do with scikit-learn

i may try to redo the kmeans logic in pytorch (or find a suitable library as substitute) in the future

scikit-learn is too hefty a dep for such simple logic

lucidrains commented 1 year ago

actually, since we aren't training the hubert kmeans, it should be straightforward to remove scikit-learn. just need to extract the cluster centers from wherever it is stored

lucidrains commented 1 year ago

@LWprogramming error is gone! :laughing:

LWprogramming commented 1 year ago

Hm that's odd, I'm still seeing the warning messages. Are you getting just the loss reported like this?

0: loss: 6.498922824859619
0: valid loss 0.8049952983856201
0: saving model to folder
1: loss: ... etc
lucidrains commented 1 year ago

hmm, i'm no longer seeing Computing label assignment and total inertia

are we referring to the same warning message?

lucidrains commented 1 year ago

@LWprogramming can you do a pip list | grep audiolm-pytorch and make sure it is at 1.2.11? i just double checked by reverting, and the warning message reappeared

lucidrains commented 1 year ago

@LWprogramming oh i see, the error messages are unrelated

well, you can train without worrying, as we aren't using scikit learn at all other than extracting the cluster centers