Closed ninjakx closed 3 years ago
Hello, I noticed that you closed this issue. Does it mean that you have solved it ?
Actually No. I got stuck at layoutLM one right now. Can you help? Yesterday only I resolved it but it started appearing again. Same error: https://github.com/huggingface/transformers/issues/5611
Can you help with config part(LayoutLM and bertlarge)?
I also downloaded the https://www.kaggle.com/jpmiller/layoutlm?select=layoutlm-large-uncased and tried model_path = "layoutlm-large-uncased"
model_path = 'bert-large-uncased'
num_labels = len(labels)
config_class, model_class, tokenizer_class = LayoutlmConfig, LayoutlmForTokenClassification, BertTokenizerFast
config = config_class.from_pretrained(model_path, num_labels=num_labels+1)
tokenizer = tokenizer_class.from_pretrained(model_path, do_lower_case=True)
model = model_class.from_pretrained(model_path, from_tf=bool(".ckpt" in model_path), config=config)
model = model.to(device)
max_seq_length = 150
pad_token_label_id = CrossEntropyLoss().ignore_index
train_dataset = CordDataset(train, tokenizer, labels, pad_token_label_id)
validation_dataset = CordDataset(val, tokenizer, labels, pad_token_label_id)
model_type = 'layoutlm'
This is what I am loading. Sorry for asking this silly doubt I just started with NLP.
Here is the code: https://www.kaggle.com/ninjakx01/notebook636ede94ea
I ain't able to run your code. Can you help? I am getting so many error. few elements gives me errors like:
CUDA error: device-side assert triggered
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
<-------- this can be solved by upgrading pytorch
IndexError: index out of range in self
KeyError: 24
in pred_list[i].append(label......)
of result
function
few of the data samples such as 79th element of train loader produces these errors.
Hello, For errors involving gpu (CUDA), can you run the code on CPU ? This latter can give more explicit message errors.
Concerning the model you are using, I took the pretrained model from the official repository : https://github.com/microsoft/unilm/tree/master/layoutlm => you can find the links to the models (Onedrive link & GoogleDrive link) in the section "Pre-trained Model" of their "readme.txt". Can you try to use this one ? It contains everything you need (the config json file, tokenizer file, the pretrained model ..). (the direct google drive link : https://drive.google.com/drive/folders/1tatUuWVuNUxsP02smZCbB5NspyGo7g2g) It's my fault, I should have specified where to find the necessary files to run the notebook.
Let me try I try to upload the same layoutLM-large-uncased
via kaggle but still got the same error but this time KeyError: 24 in pred_list[i].append(label......)
of result function
I am still getting the same error IndexError: index out of range in self
Do you have the working directory of the code in gdrive? If you can share the working demo it will help. I don't know I am getting these errors.
I tried to rerun the notebook and I get the same errors. First of all, try to use the old version of transformers (i.e !pip install transformers==2.9.0). Secondly, try to modify the code as below :
model_path = 'bert-large-uncased'
num_labels = len(labels)
config_class, model_class, tokenizer_class = LayoutlmConfig, LayoutlmForTokenClassification, BertTokenizerFast
config = config_class.from_pretrained(model_path, num_labels=num_labels) # remove the "+1"
tokenizer = tokenizer_class.from_pretrained(model_path, do_lower_case=True)
model = model_class.from_pretrained(model_path, from_tf=bool(".ckpt" in model_path), config=config)
model = model.to(device)
Sorry, but it's my fault, I didn't push the last version of the notebook.
Try to do these modifications, thank you
Still getting same error:
-> 1852 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1853
1854
IndexError: index out of range in self
:'(
https://colab.research.google.com/drive/10am_HcutuiqMZ5xsAO100EWIvdCO5h2K?usp=sharing
Is that working for you?
Yes I can see the error.
As you can see in the image above, there are some coordinates from the dataset that are negatives, and thus when you try to do the imbedding, you get the error "Index out of range in self". The thing is that the index where this negative coordinates appear is not the same as in your dataset. (For me it's the index : 669 , and for you it's 770 in train) You should then modify two cells as shown in the image below:
Thanks I didn't think that index could be different. The problem was there only thanks for your time :)
I am having trouble with training the model only with the config.json part.