Bug running fcn with mtat

Sette commented 2 years ago

result = self.forward(*input, **kwargs) RuntimeError: builtins: link error: Invalid value The above operation failed in interpreter, with the following stack trace:

The above operation failed in interpreter, with the following stack trace:

Any idea what the problem is?

minzwon commented 2 years ago

Can you share the entire code that you run and the entire error message, please? With this, I can't understand which part returned the error.

Sette commented 2 years ago

Namespace(batch_size=16, data_path='/home/bruno/data', dataset='mtat', log_step=20, lr=0.0001, model_load_path='.', model_save_path='./../models', model_type='fcn', n_epochs=200, num_workers=0, use_tensorboard=1) Traceback (most recent call last): File "main.py", line 59, in main(config) File "main.py", line 37, in main solver.train() File "/home/bruno/git/sota-music-tagging-models/training/solver.py", line 169, in train out = self.model(x) File "/home/bruno/anaconda3/envs/sota/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, *kwargs) File "/home/bruno/git/sota-music-tagging-models/training/model.py", line 51, in forward x = self.to_db(x) File "/home/bruno/anaconda3/envs/sota/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(input, **kwargs) RuntimeError: builtins: link error: Invalid value The above operation failed in interpreter, with the following stack trace:

The above operation failed in interpreter, with the following stack trace:

minzwon commented 2 years ago

Can you add these three lines in solver.py before out = self.model(x) in line 169?

print(x.shape)
print(type(x))
print(type(x[0][0][0]))

Then please share what they return. It looks like an input error. Also, please double-check if your library versions are identical to the requirements.txt.

Sette commented 2 years ago

Output: torch.Size([16, 464000]) <class 'torch.Tensor'> Traceback (most recent call last): File "main.py", line 59, in main(config) File "main.py", line 37, in main solver.train() File "/home/bruno/git/sota-music-tagging-models/training/solver.py", line 171, in train print(type(x[0][0][0])) IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

About requirements. I have a bug with pytorch 1.2: ERROR: Could not find a version that satisfies the requirement torch==1.2.0 (from -r requirements.txt (line 20)) (from versions: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1) ERROR: No matching distribution found for torch==1.2.0 (from -r requirements.txt (line 20))

minzwon commented 2 years ago

Okay, then remove those three lines from solver.py. And paste those three lines in model.py line 51 before x = self.to_db(x). What do they return?

Yeah, maybe 1.2.0 is too old. What is the version of your torchaudio?

Sette commented 2 years ago

Output: torch.Size([16, 96, 1813]) <class 'torch.Tensor'> <class 'torch.Tensor'>

torchaudio version is 0.3.0

minzwon commented 2 years ago

Okay, the input shape looks fine. There are two more reasons that I suspect.

Please check if the input includes Inf or NaN. Remove the previous 3 lines and paste the following. print(np.isnan(x).any()) print(np.isinf(x).any())
Does this happen no matter you use CPU or GPU? Sometimes it returns invalid value error because of the CUDA configuration. Please check if this happens when you use your CPU.

Sette commented 2 years ago

How i can run with CPU?

minzwon commented 2 years ago

You can control it in solver.py.

x.cpu() will send your input to CPU and self.model.cpu() will send your model to CPU. Try them in line 165.

Sette commented 2 years ago

I run it with the CPU and it worked. I believe it is some configuration of cuda and CUDNN.

minzwon commented 2 years ago

Yes, then you need to check your CUDA configuration.

minzwon / sota-music-tagging-models

Bug running fcn with mtat #14