Closed ScottishFold007 closed 1 month ago
Hey @ScottishFold007, in your data collator, is_longer_features
is a list, it should be a torch tensor instead. Note that you've commented the following line:
#is_longer_batch = self.processor.tokenizer.pad(is_longer_features, return_tensors="pt")
This should probably not be commented! I hope it helps!
When this line of code is not commented out, this is the error that appears:
#is_longer_batch = self.processor.tokenizer.pad(is_longer_features, return_tensors="pt")
/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index) 52 else: 53 data = self.dataset[possibly_batched_index] ---> 54 return self.collate_fn(data)
/tmp/ipykernel_1338324/128232537.py in call(self, features) 15 16 is_longer_features = [feature["is_longer"] for feature in features] ---> 17 is_longer_batch = self.processor.tokenizer.pad(is_longer_features, return_tensors="pt") 18 19 # get the tokenized label sequences
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py in pad(self, encoded_inputs, padding, max_length, pad_to_multiple_of, return_attention_mask, return_tensors, verbose) 3288 raise ValueError( 3289 "You should supply an encoding or a list of encodings to this method " -> 3290 f"that includes {self.model_input_names[0]}, but you provided {list(encoded_inputs.keys())}" 3291 ) 3292
AttributeError: 'list' object has no attribute 'keys'
According to CLAP docs, is_longer
should be a tensor of shape (batch_size, 1)
.
What you can do is probably something like this:
is_longer_features = [feature["is_longer"] for feature in features]
is_longer_features = torch.tensor(is_longer_features)[...,None]
which basically create a tensor and adds an extra-dimension!
According to CLAP docs,
is_longer
should be a tensor of shape(batch_size, 1)
.What you can do is probably something like this:
is_longer_features = [feature["is_longer"] for feature in features] is_longer_features = torch.tensor(is_longer_features)[...,None]
which basically create a tensor and adds an extra-dimension! When I made the changes here:
The following error was reported again:
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker
output = module(*input, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/clap/modeling_clap.py", line 2102, in forward
text_outputs = self.text_model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/clap/modeling_clap.py", line 1892, in forward
encoder_outputs = self.encoder(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/clap/modeling_clap.py", line 1602, in forward
layer_outputs = layer_module(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/clap/modeling_clap.py", line 1491, in forward
self_attention_outputs = self.attention(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/clap/modeling_clap.py", line 1418, in forward
self_outputs = self.self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/clap/modeling_clap.py", line 1356, in forward
context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Hey @ScottishFold007, could you use CUDA_LAUNCH_BLOCKING=1
in order for us to have more details on why your training didn't work ?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hey @ScottishFold007, could you use
CUDA_LAUNCH_BLOCKING=1
in order for us to have more details on why your training didn't work ?
Hello,I've tried a lot of data and solutions, but I still haven't managed to fine-tune the "clap" model successfully. It would be great if there could be a simple fine-tuning example provided by the official team. I hope this can be fine-tuned successfully, just like the "clip" model.
Hey @ScottishFold007, thanks for your response, could you provide the trace detail when using CUDA_LAUNCH_BLOCKING=1
?
Also do you think you could copy the whole script into a gist and share it there ? It would be of tremendous help! I'll take a look as soon as it's done! Thanks
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.39.3Who can help?
@sanchit-gandhi @ylacombe @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm trying to fine-tune the clap, but I'm having some problems with it, and I've previously referenced a solution in https://github.com/huggingface/transformers/issues/26864
Here is my code:
load data
load model
process data:
Then the following error occurs:
AttributeError: Caught AttributeError in replica 0 on device 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker output = module(*input, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/clap/modeling_clap.py", line 2094, in forward audio_outputs = self.audio_model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/clap/modeling_clap.py", line 1742, in forward return self.audio_encoder( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/clap/modeling_clap.py", line 913, in forward is_longer_list = is_longer.to(input_features.device) AttributeError: 'list' object has no attribute 'to'
I'm having a lot of problems with the mode of enable_fusion=True, and I don't seem to have a good grasp of the handling of the input is_longer, so I hope I can get your pointers on this piece, thanks!
Expected behavior
The model should train normally.