mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.57k stars 548 forks source link

error run the rnn speech workload, failed to process data after enter docker #691

Open gaowayne opened 7 months ago

gaowayne commented 7 months ago

after I build contaner, and enter it, when I preprocess the data, it have failure with data attribute.

root@ed1902ed9916:/workspace/rnnt# bash scripts/preprocess_librispeech.sh
Traceback (most recent call last):
  File "./utils/convert_librispeech.py", line 25, in <module>
    from preprocessing_utils import parallel_preprocess
  File "/workspace/rnnt/utils/preprocessing_utils.py", line 18, in <module>
    import librosa
  File "/opt/conda/lib/python3.8/site-packages/librosa/__init__.py", line 211, in <module>
    from . import core
  File "/opt/conda/lib/python3.8/site-packages/librosa/core/__init__.py", line 9, in <module>
    from .constantq import *  # pylint: disable=wildcard-import
  File "/opt/conda/lib/python3.8/site-packages/librosa/core/constantq.py", line 1058, in <module>
    dtype=np.complex,
  File "/opt/conda/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'complex'.
`np.complex` was a deprecated alias for the builtin `complex`. To avoid this error in existing code, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Traceback (most recent call last):
gaowayne commented 6 months ago

I install host OS 22.04, now try run this, I got out of space error when building container image.

 => ERROR [ 1/10] FROM docker.io/pytorch/pytorch:1.7.0-cuda11.0-cudnn8-devel@sha256:837e6964e5db6e5b35f4d5e98e9cac073ab757766039b9503f39c14beafb0e98                                                                        182.4s
 => => resolve docker.io/pytorch/pytorch:1.7.0-cuda11.0-cudnn8-devel@sha256:837e6964e5db6e5b35f4d5e98e9cac073ab757766039b9503f39c14beafb0e98                                                                                  0.0s
 => => sha256:f20d42e5d606f02b790edccc1e6741e0f287ee705a94998fd50c160e96301823 10.70kB / 10.70kB                                                                                                                              0.0s
 => => sha256:837e6964e5db6e5b35f4d5e98e9cac073ab757766039b9503f39c14beafb0e98 2.85kB / 2.85kB                                                                                                                                0.0s
 => => sha256:171857c49d0f5e2ebf623e6cb36a8bcad585ed0c2aa99c87a055df034c1e5848 26.70MB / 26.70MB                                                                                                                              0.8s
 => => sha256:61e52f862619ab016d3bcfbd78e5c7aaaa1989b4c295e6dbcacddd2d7b93e1f5 162B / 162B                                                                                                                                    0.5s
 => => sha256:419640447d267f068d2f84a093cb13a56ce77e130877f5b8bdb4294f4a90a84f 852B / 852B                                                                                                                                    0.4s
 => => sha256:2a93278deddf8fe289dceef311ed19e8f2083a88eba6be60d393842fd40697b0 7.21MB / 7.21MB                                                                                                                                0.8s
 => => sha256:c9f080049843544961377a152d7d86c34816221038b8da3e3dc207ccddb72549 10.33MB / 10.33MB                                                                                                                              0.9s
 => => extracting sha256:171857c49d0f5e2ebf623e6cb36a8bcad585ed0c2aa99c87a055df034c1e5848                                                                                                                                     1.6s
 => => sha256:8189556b23294579329c522acf5618c024520b323d6a68cdd9eca91ca4f2f454 1.00kB / 1.00kB                                                                                                                                1.0s
 => => sha256:c306a0c97a557ede3948263983918da203f1837a354a86fcb5d6270b0c52b9ad 1.13GB / 1.13GB                                                                                                                               31.6s
 => => sha256:4a9478bd0b2473c3d7361f9a0a8e98923897103b9b2eb55097db2b643f50c13e 970.89MB / 970.89MB                                                                                                                           28.2s
 => => sha256:19a76c31766d36601b8c4a57ea5548a4e22f69846ac653c5ca2bea5eb92b759d 1.06GB / 1.06GB                                                                                                                               30.3s
 => => extracting sha256:419640447d267f068d2f84a093cb13a56ce77e130877f5b8bdb4294f4a90a84f                                                                                                                                     0.0s
 => => extracting sha256:61e52f862619ab016d3bcfbd78e5c7aaaa1989b4c295e6dbcacddd2d7b93e1f5                                                                                                                                     0.0s
 => => extracting sha256:2a93278deddf8fe289dceef311ed19e8f2083a88eba6be60d393842fd40697b0                                                                                                                                     0.4s
 => => extracting sha256:c9f080049843544961377a152d7d86c34816221038b8da3e3dc207ccddb72549                                                                                                                                     0.4s
 => => extracting sha256:8189556b23294579329c522acf5618c024520b323d6a68cdd9eca91ca4f2f454                                                                                                                                     0.0s
 => => sha256:1d18e0f6b7f66fdbaba1169a3439577dc12fd53b21d7507351a9098e68eb6207 1.01MB / 1.01MB                                                                                                                               31.6s
 => => sha256:d8015a90b67c809145c04360809eba130365a701a96319f8fc2c3c786434c33a 2.32GB / 2.32GB                                                                                                                               71.7s
 => => sha256:211a7eed3486a96a5e8ba778a64f46475a7131e3b66ccc4ee3af57e334fb534f 138B / 138B                                                                                                                                   31.9s
 => => extracting sha256:c306a0c97a557ede3948263983918da203f1837a354a86fcb5d6270b0c52b9ad                                                                                                                                    23.8s
 => => extracting sha256:4a9478bd0b2473c3d7361f9a0a8e98923897103b9b2eb55097db2b643f50c13e                                                                                                                                    25.8s
 => => extracting sha256:19a76c31766d36601b8c4a57ea5548a4e22f69846ac653c5ca2bea5eb92b759d                                                                                                                                    48.4s
 => => extracting sha256:1d18e0f6b7f66fdbaba1169a3439577dc12fd53b21d7507351a9098e68eb6207                                                                                                                                     0.1s
 => => extracting sha256:d8015a90b67c809145c04360809eba130365a701a96319f8fc2c3c786434c33a                                                                                                                                    52.9s
------
 > [ 1/10] FROM docker.io/pytorch/pytorch:1.7.0-cuda11.0-cudnn8-devel@sha256:837e6964e5db6e5b35f4d5e98e9cac073ab757766039b9503f39c14beafb0e98:
------
Dockerfile:16
--------------------
  14 |     
  15 |     ARG FROM_IMAGE_NAME=pytorch/pytorch:1.7.0-cuda11.0-cudnn8-devel
  16 | >>> FROM ${FROM_IMAGE_NAME}
  17 |     
  18 |     ENV PYTORCH_VERSION=1.7.0a0+7036e91
--------------------
ERROR: failed to solve: failed to register layer: write /opt/conda/lib/libnvvm.so.3.3.0: no space left on device
coppock commented 2 months ago

@gaowayne, my guess is that your second issue, the out-of-space error, is due to your system (it's probably low on space). However, I'm seeing issues similar to your first one. Specifically, if I build the Docker container without editing the Dockerfile or the requirements.txt, I get both an ImportError where Numba needs a NumPy more recent than 1.19 and a SciPy warning. They follow.

/opt/conda/lib/python3.8/site-packages/scipy/__init__.py:143: UserWarning: A NumPy version >=1.19.5 and <1.27.0 is required for this version of SciPy (detected version 1.19.2)
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Traceback (most recent call last):
  File "./utils/convert_librispeech.py", line 25, in <module>
    from preprocessing_utils import parallel_preprocess
  File "/workspace/rnnt/utils/preprocessing_utils.py", line 18, in <module>
    import librosa
  File "/opt/conda/lib/python3.8/site-packages/librosa/__init__.py", line 211, in <module>
    from . import core
  File "/opt/conda/lib/python3.8/site-packages/librosa/core/__init__.py", line 5, in <module>
    from .convert import *  # pylint: disable=wildcard-import
  File "/opt/conda/lib/python3.8/site-packages/librosa/core/convert.py", line 7, in <module>
    from . import notation
  File "/opt/conda/lib/python3.8/site-packages/librosa/core/notation.py", line 8, in <module>
    from ..util.exceptions import ParameterError
  File "/opt/conda/lib/python3.8/site-packages/librosa/util/__init__.py", line 83, in <module>
    from .utils import *  # pylint: disable=wildcard-import
  File "/opt/conda/lib/python3.8/site-packages/librosa/util/utils.py", line 10, in <module>
    import numba
  File "/opt/conda/lib/python3.8/site-packages/numba/__init__.py", line 55, in <module>
    _ensure_critical_deps()
  File "/opt/conda/lib/python3.8/site-packages/numba/__init__.py", line 40, in _ensure_critical_deps
    raise ImportError(msg)
ImportError: Numba needs NumPy 1.22 or greater. Got NumPy 1.19.
coppock commented 2 months ago

If I apply the below patch in order to induce pip to install a version of NumPy compatible with the installed versions of both SciPy and Numba, I get the error you were seeing. See patch and error.

diff --git a/rnn_speech_recognition/pytorch/requirements.txt b/rnn_speech_recognition/pytorch/requirements.txt
index 7318388..55f9e88 100755
--- a/rnn_speech_recognition/pytorch/requirements.txt
+++ b/rnn_speech_recognition/pytorch/requirements.txt
@@ -8,3 +8,4 @@ librosa==0.8.0
 sox==1.4.1
 sentencepiece==0.1.94
 pandas==1.1.5
+numpy>=1.22,<1.27.0
Traceback (most recent call last):
  File "./utils/convert_librispeech.py", line 25, in <module>
    from preprocessing_utils import parallel_preprocess
  File "/workspace/rnnt/utils/preprocessing_utils.py", line 18, in <module>
    import librosa
  File "/opt/conda/lib/python3.8/site-packages/librosa/__init__.py", line 211, in <module>
    from . import core
  File "/opt/conda/lib/python3.8/site-packages/librosa/core/__init__.py", line 9, in <module>
    from .constantq import *  # pylint: disable=wildcard-import
  File "/opt/conda/lib/python3.8/site-packages/librosa/core/constantq.py", line 1058, in <module>
    dtype=np.complex,
  File "/opt/conda/lib/python3.8/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'complex'.
`np.complex` was a deprecated alias for the builtin `complex`. To avoid this error in existing code, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
coppock commented 2 months ago

I'm seeing these errors both on top of tree and at tag v4.0