Closed LiyangTseng closed 2 years ago
That's very odd. Have you tried adding print statements after that one to check where exactly it fails? (I can tell that it fails at some point before making the prior transformer, but I'm not sure what would cause that...)
Hi @rodrigo-castellon, thank you for the response, I found out that other representation extracting methods (musicnn, choi, etc.) all work, which makes me wonder whether it is due to my hardware does not meet the criteria to inference Jukebox, as stated in the Jukebox repo.
The hps are for a V100 GPU with 16 GB GPU memory. The 1b_lyrics, 5b, and 5b_lyrics top-level priors take up 3.8 GB, 10.3 GB, and 11.5 GB, respectively
Also since I'm kind of new to docker, I was wondering which file were you referring to adding code segments when you say add print statements.
adding print statements after that one to check where exactly it fails
Is executing docker using the following command somehow implicitly runs
representations/jukebox/main.py
, so I could insert the print statement inside this file to check what went wrong? Or the docker image serves as a blackbox and there is no way to modify it?docker run \ -it \ --rm \ -v /home/cdonahue/.jukemir/processed/gtzan_ff/wav:/input \ -v /home/cdonahue/.jukemir/representations/gtzan_ff/jukebox:/output \ jukemir/representations_jukebox \ --batch_size 256 \ --batch_idx 1
Sorry, yes it is not exactly straightforward to modify the code, but you might be able to create another image that builds off of this image, which modifies (or allows you to modify, by allowing you to run in interactive mode) the representations/jukebox/main.py
file. Currently busy but can get back to you later about what exactly would be the steps to achieve this.
OK, that would be very helpful! Is this related to the usage of representations/jukebox.dockerfile
?
I actually have another issue when extracting features using musicnn, the following errors pop up. I did find similar solutions for these errors, but it seems that it also requires modifications inside docker. Could you also provide some further instructions to solve this issues?
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
0% 0/235 [00:00<?, ?it/s]2022-06-04 07:33:42.725555: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2022-06-04 07:33:42.755915: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-04 07:33:42.756358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: NVIDIA GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2022-06-04 07:33:42.756613: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2022-06-04 07:33:42.757485: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2022-06-04 07:33:42.758211: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2022-06-04 07:33:42.758425: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2022-06-04 07:33:42.759361: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2022-06-04 07:33:42.760092: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2022-06-04 07:33:42.762102: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2022-06-04 07:33:42.762239: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-04 07:33:42.762717: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-04 07:33:42.763052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2022-06-04 07:33:42.763318: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-06-04 07:33:42.838388: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-04 07:33:42.838765: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x50e7f00 executing computations on platform CUDA. Devices:
2022-06-04 07:33:42.838781: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): NVIDIA GeForce RTX 2070, Compute Capability 7.5
2022-06-04 07:33:42.840274: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699850000 Hz
2022-06-04 07:33:42.840548: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5cd7a60 executing computations on platform Host. Devices:
2022-06-04 07:33:42.840561: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2022-06-04 07:33:42.840700: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-04 07:33:42.840989: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: NVIDIA GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2022-06-04 07:33:42.841019: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2022-06-04 07:33:42.841031: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2022-06-04 07:33:42.841042: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2022-06-04 07:33:42.841069: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2022-06-04 07:33:42.841097: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2022-06-04 07:33:42.841108: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2022-06-04 07:33:42.841120: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2022-06-04 07:33:42.841193: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-04 07:33:42.841649: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-04 07:33:42.841955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2022-06-04 07:33:42.841998: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2022-06-04 07:33:42.842689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-06-04 07:33:42.842699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2022-06-04 07:33:42.842704: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2022-06-04 07:33:42.842772: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-04 07:33:42.843111: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-06-04 07:33:42.843377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7405 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2022-06-04 07:33:43.866519: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2022-06-04 07:33:43.980245: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2022-06-04 07:33:44.391400: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2022-06-04 07:33:44.391463: W ./tensorflow/stream_executor/stream.h:1995] attempting to perform DNN operation using StreamExecutor without DNN support
0% 0/235 [00:02<?, ?it/s]
Computing spectrogram (w/ librosa) and tags (w/ tensorflow).. Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: cuDNN launch failure : input shape ([1,1,187,96])
[[{{node model/batch_normalization/cond/FusedBatchNorm_1}}]]
(1) Internal: cuDNN launch failure : input shape ([1,1,187,96])
[[{{node model/batch_normalization/cond/FusedBatchNorm_1}}]]
[[model/Add_1/_137]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 42, in <module>
input_path, model="MSD_musicnn_big", extract_features=True
File "/code/musicnn-516acb2a0ff5ef73f64547898e018e793152c506/musicnn/extractor.py", line 172, in extractor
is_training: False})
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: cuDNN launch failure : input shape ([1,1,187,96])
[[node model/batch_normalization/cond/FusedBatchNorm_1 (defined at /tmp/tmp7b53zes7.py:14) ]]
(1) Internal: cuDNN launch failure : input shape ([1,1,187,96])
[[node model/batch_normalization/cond/FusedBatchNorm_1 (defined at /tmp/tmp7b53zes7.py:14) ]]
[[model/Add_1/_137]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'model/batch_normalization/cond/FusedBatchNorm_1':
File "main.py", line 42, in <module>
input_path, model="MSD_musicnn_big", extract_features=True
File "/code/musicnn-516acb2a0ff5ef73f64547898e018e793152c506/musicnn/extractor.py", line 141, in extractor
y, timbral, temporal, cnn1, cnn2, cnn3, mean_pool, max_pool, penultimate = models.define_model(x, is_training, model, num_classes)
File "/code/musicnn-516acb2a0ff5ef73f64547898e018e793152c506/musicnn/models.py", line 20, in define_model
return build_musicnn(x, is_training, num_classes, num_filt_midend=512, num_units_backend=500)
File "/code/musicnn-516acb2a0ff5ef73f64547898e018e793152c506/musicnn/models.py", line 32, in build_musicnn
frontend_features_list = frontend(x, is_training, config.N_MELS, num_filt=1.6, type='7774timbraltemporal')
File "/code/musicnn-516acb2a0ff5ef73f64547898e018e793152c506/musicnn/models.py", line 58, in frontend
normalized_input = tf.compat.v1.layers.batch_normalization(expand_input, training=is_training)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/layers/normalization.py", line 327, in batch_normalization
return layer.apply(inputs, training=training)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply
return self.__call__(inputs, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/layers/base.py", line 537, in __call__
outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
), args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 450, in converted_call
result = converted_f(*effective_args, **kwargs)
File "/tmp/tmp7b53zes7.py", line 14, in tf__call
retval_ = ag__.converted_call('call', super(BatchNormalization, self), ag__.ConversionOptions(recursive=True, force_conversion=False, optional_features=(), internal_convert_user_code=True), (inputs,), {'training': training})
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 356, in converted_call
return _call_unconverted(f, args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py", line 253, in _call_unconverted
return f(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/normalization.py", line 651, in call
outputs = self._fused_batch_norm(inputs, training=training)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/normalization.py", line 494, in _fused_batch_norm
training, _fused_batch_norm_training, _fused_batch_norm_inference)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/tf_utils.py", line 58, in smart_cond
pred, true_fn=true_fn, false_fn=false_fn, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/smart_cond.py", line 59, in smart_cond
name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1988, in cond
orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1814, in BuildCondBranch
original_result = fn()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/normalization.py", line 491, in _fused_batch_norm_inference
data_format=self._data_format)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py", line 1329, in fused_batch_norm
name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 3946, in _fused_batch_norm
name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
Hi,
Apologies for the late reply, I hope you got this working. For completeness and in case you are still unsure, though, here's what you can do to modify a particular Docker image. Say you want to modify representations_musicnn
. What I would do is the following:
python main.py
, which means that this is the script that is eventually getting executed in the end. This means that the rest of the Dockerfile is just setting up the environment to run that script.
FROM jukemir/representations_musicnn:latest
ENTRYPOINT ["bash"]
and then do `docker build -t musicnn_modified .` (do `docker build -t musicnn_modified -f dockerfilename .` if you chose to name your Dockerfile something different).
3. You can then run the built image and use the shell inside, which means that you'll be able to poke around and see what the container sees for yourself, which helps for debugging issues like the ones you've mentioned above. This can be done with (as an example) `docker run --rm -it -v /home/unixusernamehere/.jukemir/processed/gtzan_ff/wav:/input -v /home/unixusernamehere/.jukemir/representations/gtzan_ff/musicnn:/output musicnn_modified --batch_size 256 --batch_idx 0`. At this point, you'll be given a bash prompt, and here if you run `python main.py`, you should be able to reproduce the error that you've been getting. At this point, since you have a terminal shell, you should be able to, as I said, poke around and figure out what's causing the issue. Moreover, this is a particularly nice environment since if you get confused about any changes you've made to your environment and want to start over again, you can just re-run the Docker image. Then, once you figure it out, see the next step.
4. To fix or provide a workaround for the issue, you're probably going to need to run one or more commands (for example, `sed` to patch a file, upgrade a dependency, or something like that). Once you've got the chain of those commands nailed down so that `python main.py` works flawlessly, put them into the Dockerfile as so
FROM jukemir/representations_musicnn:latest
RUN first command RUN second command RUN third command ...
ENTRYPOINT ["python", "main.py"]
Don't forget to change the last ENTRYPOINT line back, as well.
At this point, you should be good to go.
Let me know if you have any more issues.
Thanks for the detailed reply! Also just curious, if we use the pre-trained weight, it is possible to inference the jukebox representation using only CPU as a feature extraction method?
It is probably possible, but would need a decent amount of engineering work, since the Jukebox codebase itself is written on the assumption that it's always running on GPU (you can see for yourself if you try running the Interacting with Jukebox notebook without a GPU.
Hi, really appreciate this nice work. I stumbled upon a problem when trying to reproduce the experimental results. After executing
3_extract.sh
following the instructions in README, there is nothing inside the representation output folder, say~/.jukemir/representations/gtzan_ff/jukebox
, and the terminal only generated the following texts without indicating errors.Is it because the hardware does not meet your execution criteria (at least 30GB of RAM and a GPU with at least 12GB)? Thanks for your reply in advance!