Closed R4ZZ3 closed 1 year ago
Is it ok to use roberta-large models also?
Hi, due to the limitation of computing resources, we haven't trained the large LiLT model yet. It is considered for future work.
+1 for training LiLT-Large 👍🏻
will this ever be done? I am trying to use LILT but we have some forms that go over the 512 limit and we can't just truncate as the data could be anywhere in the tokens.
@AnQuethit it seems this project is mostly dead in terms of development.
My solution for that was to just tokenize longer documents in chunks with some overlap
The doc_stride parameter in huggingface tokenizer is very useful for this
i ran across the stride parameter 30 minutes ago but i haven't figured out how to get around the error it causes. do you have working code with it save me some time lol ?
Couldn't cast array of type
list
Hi and thanks for creating this,
I am trying to use https://huggingface.co/Finnish-NLP/roberta-large-finnish-v2?text=Moikka+olen+%3Cmask%3E+kielimalli. with this repo. I have successfully run the weight generation:
python gen_weight_roberta_like.py --lilt lilt-only-base/pytorch_model.bin --text roberta-large-finnish-v2/roberta-large-finnish-v2/pytorch_model.bin --config roberta-large-finnish-v2/roberta-large-finnish-v2/config.json --out lilt-roberta-large-finnish-v2
But when I try to load the model then I get the following error:
Do you have idea what might cause this and how could it be fixed?