Open zhangyuqi-1 opened 1 month ago
File "/data1/zhangyq/change-records-analysis/my_model.py", line 63, in forward
all_loss_list = [-crf(lo, la, reduction='mean') for crf, lo, la in
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data1/zhangyq/change-records-analysis/my_model.py", line 63, in TORCH_USE_CUDA_DSA
to enable device-side assertions.
切换到CPU环境后,报错 IndexError: index -100 is out of bounds for dimension 0 with size 3 看来是-100的原因
我改了代码让-100对应的位置全部mask.,但是还是报错, 最终我将-100替换成了0,成功运行了,但是0是有具体的label的,不知道添加mask后会如何处理,是否会忽略计算。之所以加-100是因为pytorch会忽略对应的值,不进行梯度计算。我在想这个地方是否可以在源码层面改进下
Hi, can you please post in English? I don't know Chinese. Also, please post a minimal code to reproduce the error.
嗨,你能用英文发帖吗?我不懂中文。另外,请发布最少的代码来重现错误。
sorry,sorry,This is my mistake
Do you still need help on this issue? If so, please post in English. Otherwise, I’ll close the issue.On 31 Jul 2024, at 11:38 AM, zhangyuqi-1 @.***> wrote:
嗨,你能用英文发帖吗?我不懂中文。另外,请发布最少的代码来重现错误。
sorry,sorry,This is my mistake
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
Situation Description: When the tag contains the label -100, an error occurs, even if the -100 label is in the mask. The purpose of the -100 label is to make PyTorch ignore the corresponding token. See the following example code from the transformers: Link to the code When I changed the label from -100 to 0, it worked, and the operation proceeded smoothly.
My questions are as follows:
1、By changing the label from -100 to 0 and masking all the positions corresponding to the -100 label, will there be no impact on the training in the end, as the 0 label has a specific meaning. 2、Is there any consideration for improving this aspect, so that the CRF module can handle the -100 label with ease? Thank you for your reply, very much appreciated.
`import torch from torchcrf import CRF
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
num_tags = 5 model = CRF(num_tags).to(device) seq_length = 3 batch_size = 2 emissions = torch.randn(seq_length, batch_size, num_tags).to(device) tags = torch.tensor([[0, 1], [2, 4], [3, -100]], dtype=torch.long).to(device) model(emissions, tags) mask = torch.tensor([[1, 1], [1, 1], [1, 0]], dtype=torch.uint8).to(device) model(emissions, tags, mask=mask)`
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Thank you for your clear question. Yes, this is an expected behaviour with the -100.
self.all_crf_list:[CRF(num_tags=3), CRF(num_tags=13)] all_logits_list: [tensor([[[-3.9874e+36, 1.4790e+13, -3.9874e+36], [-3.9874e+36, -7.3833e+12, -3.9874e+36], [-3.9874e+36, -2.0520e+12, -3.9874e+36], ..., [-3.9874e+36, -2.2289e+12, -3.9874e+36], [-3.9874e+36, 3.4035e+12, -3.9874e+36], [-3.9874e+36, -4.3337e+12, -3.9874e+36]],
self.labels_split= [tensor([[-100, 1, 2, ..., -100, -100, -100], [-100, 1, 2, ..., -100, -100, -100], [-100, 1, 2, ..., -100, -100, -100], ..., [-100, 1, 2, ..., -100, -100, -100], [-100, 0, 0, ..., -100, -100, -100], [-100, 1, 2, ..., -100, -100, -100]], device='cuda:0'), tensor([[-100, 6, 12, ..., -100, -100, -100], [-100, 6, 12, ..., -100, -100, -100], [-100, 0, 0, ..., -100, -100, -100], ..., [-100, 0, 0, ..., -100, -100, -100], [-100, 0, 0, ..., -100, -100, -100], [-100, 6, 12, ..., -100, -100, -100]], device='cuda:0')]
[-crf(lo, la, reduction='mean') for crf, lo, la in zip(self.all_crf_list, all_logits_list, labels_split)] 跑不通 Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.