voidful / BDG

Code for "A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies."
https://voidful.github.io/DG-Showcase/
28 stars 4 forks source link

transformers import cached_path #13

Open ghost opened 2 years ago

ghost commented 2 years ago

whenever I am trying to run the cell its showing this error

ImportError Traceback (most recent call last) Cell In [5], line 6 4 from torch.distributions import Categorical 5 import itertools as it ----> 6 import nlp2go 8 tokenizer = RobertaTokenizer.from_pretrained("LIAMF-USP/roberta-large-finetuned-race") 9 model = RobertaForMultipleChoice.from_pretrained("LIAMF-USP/roberta-large-finetuned-race")

File ~/distractor/venv/lib/python3.10/site-packages/nlp2go/init.py:1 ----> 1 from .model import Model 2 from .main import parse_args

File ~/distractor/venv/lib/python3.10/site-packages/nlp2go/model.py:5 3 import nlp2 4 import tfkit ----> 5 from transformers import pipeline, pipelines, BertTokenizer, cached_path, AutoTokenizer 7 from nlp2go.modelhub import MODELMAP 8 from nlp2go.parser import Parser

ImportError: cannot import name 'cached_path' from 'transformers' (/home/amiya/distractor/venv/lib/python3.10/site-packages/transformers/init.py)

voidful commented 2 years ago

the latest version of transformers remove cached_path, you may need to rollback to transformers==2.5.1

ghost commented 2 years ago

the latest version of transformers remove cached_path, you may need to rollback to transformers==2.5.1

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tfkit 0.8.1 requires transformers>=3.3.0, but you have transformers 2.5.1 which is incompatible.

and if I try to install tfkit version which is mentioned in the requirements.txt file I am getting this error

nlp2go 0.3.5 requires tfkit>=0.5.6, but you have tfkit 0.3.94.dev1 which is incompatible.

whenever I am trying to install 2.5.1 transformer version this error in showint

voidful commented 2 years ago

the latest version of transformers remove cached_path, you may need to rollback to transformers==2.5.1

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tfkit 0.8.1 requires transformers>=3.3.0, but you have transformers 2.5.1 which is incompatible.

and if I try to install tfkit version which is mentioned in the requirements.txt file I am getting this error

nlp2go 0.3.5 requires tfkit>=0.5.6, but you have tfkit 0.3.94.dev1 which is incompatible.

whenever I am trying to install 2.5.1 transformer version this error in showint

I just updated nlp2go=0.4.15 to fix this issue, here is the usage:

https://colab.research.google.com/drive/1yA3Rex9JHKJmc52E3YdsBQ4eQ_R6kEZB?usp=sharing

ghost commented 2 years ago

Thank you very much for helping me. This is working fine but the only issue I am currently facing right now is that the maximum number of characters is 512 but my paragraph could have any number of characters. Could you please help me about this issue. The notebook you have shared, if I pass a json_text which length is more than 512 it is returning none. This is the output {'result': ['']}

ayan2427 commented 2 years ago

I have also facing the same issue with the length of the paragraph. In my case the paragraph length is around 3000 characters. Also it will be helpful if there is a way to use incremental training of your given model. In my corpus there are around thousand different subject paragraphs (Ex: History paragraphs, Geography paragraphs etc.). So it will be helpful to train my corpus on your model and if there is a way to accomplish this task, please guide me to achieve it. Thank you.

voidful commented 2 years ago

Thank you very much for helping me. This is working fine but the only issue I am currently facing right now is that the maximum number of characters is 512 but my paragraph could have any number of characters. Could you please help me about this issue. The notebook you have shared, if I pass a json_text which length is more than 512 it is returning none. This is the output {'result': ['']}

I have also facing the same issue with the length of the paragraph. In my case the paragraph length is around 3000 characters. Also it will be helpful if there is a way to use incremental training of your given model. In my corpus there are around thousand different subject paragraphs (Ex: History paragraphs, Geography paragraphs etc.). So it will be helpful to train my corpus on your model and if there is a way to accomplish this task, please guide me to achieve it. Thank you.

The original model is just 512 size input at most. After all this years, actually we can simply retrain a longer model. I have a bart version that can take 1024 length of token as input: https://huggingface.co/voidful/bart-distractor-generation Also, I will work on a longer model for distractor generation.