Open cmgchess opened 1 year ago
look into https://github.com/thunlp/OpenPrompt/blob/main/openprompt/plms/__init__.py#L87. The 'xlm-roberta-base' is not the same as "roberta", it uses XLMRobertaConfig other than RobertaConfig, XLMRobertaTokenizer instead of RobertaTokenizer.
It would be possible to modify here to add "xlm-roberta-base" into _MODEL_CLASSES. Or you can copy those codes in load_plm out into your juypter notebook, and modify those model_class.config
, model_class.tokenizer
, etc. into xlm-roberta related one.
@Achazwl thank you! any future plans on extending the framework for XLMR as well?
Dear All I have a question about modifying init.py please guide me. I want to use the SciBERT model from Huggingface I try to add the model and tokenizer to init.py in colab. I don't know what is the config or wrapper. after that, I close the init.py and run again but the Seibert is not recognized. How I can test other models in Huggingface?
Dear All I have a question about modifying init.py please guide me. I want to use the SciBERT model from Huggingface I try to add the model and tokenizer to init.py in colab. I don't know what is the config or wrapper. after that, I close the init.py and run again but the Seibert is not recognized. How I can test other models in Huggingface?
after you modified the code, you should reload code in your python working space. eg: `from imp import reload
openprompt = reload(openprompt)
load_plm = openprompt.plms.load_plm`
and you should modify the code, and import it using `import sys
sys.path.insert(0, '/location_path/OpenPrompt')`
Thank you for your reply
I change the code in colab like the bellow: ![Uploading image.png…]()
Adding a model will result in an error. I probably didn't do the right in reloading the module Your guidance in this regard will be very valuable
On Wed, 29 Mar 2023, 4:41 pm kinghmy, @.***> wrote:
Dear All I have a question about modifying init.py please guide me. I want to use the SciBERT model from Huggingface I try to add the model and tokenizer to init.py in colab. I don't know what is the config or wrapper. after that, I close the init.py and run again but the Seibert is not recognized. How I can test other models in Huggingface?
after you modified the code, you should reload code in your python working space. eg: from imp import reload openprompt = reload(openprompt) load_plm = openprompt.plms.load_plm
and you should modify the code, and import it using import sys sys.path.insert(0, '/location_path/OpenPrompt')
— Reply to this email directly, view it on GitHub https://github.com/thunlp/OpenPrompt/issues/199#issuecomment-1488486380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP4V5FJ4I6HISD5EBOFLC6DW6QRHXANCNFSM6AAAAAARE4SUEE . You are receiving this because you commented.Message ID: @.***>
Thank you for your reply
I change the code in colab like the bellow:
1- add this model to init.py
'PubMedBERT': ModelClass(**{ 'config': BertConfig, 'tokenizer': AutoTokenizer.from_pretrained('microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext'), 'model':AutoModel.from_pretrained('microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext'), 'wrapper': MLMTokenizerWrapper, }),
2- reload the model
`import sys
import importlib
sys.path.insert(0, '/content/OpenPrompt') importlib.reload(sys)`
3- run the cell:
plm, tokenizer, model_config, WrapperClass = load_plm("PubMedBERT",'microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext')
4- I get this error:
`KeyError Traceback (most recent call last)
1 frames /content/OpenPrompt/OpenPrompt/openprompt/plms/init.py in get_model_class(plm_type) 89 "tokenizer": GPT2Tokenizer, 90 "model": GPTJForCausalLM, ---> 91 "wrapper": LMTokenizerWrapper 92 }), 93 }
KeyError: 'PubMedBERT'`
Adding a model will result in an error. I probably didn't do the right in reloading the module Your guidance in this regard will be very valuable
Thank you for your reply
I change the code in colab like the bellow in the attached figure. Adding a model will result in an error. I probably didn't do the right in reloading the module Your guidance in this regard will be very valuable
On Wed, 29 Mar 2023, 6:02 pm Hoda Memarzadeh, @.***> wrote:
Thank you for your reply
On Wed, 29 Mar 2023, 4:41 pm kinghmy, @.***> wrote:
Dear All I have a question about modifying init.py please guide me. I want to use the SciBERT model from Huggingface I try to add the model and tokenizer to init.py in colab. I don't know what is the config or wrapper. after that, I close the init.py and run again but the Seibert is not recognized. How I can test other models in Huggingface?
after you modified the code, you should reload code in your python working space. eg: from imp import reload openprompt = reload(openprompt) load_plm = openprompt.plms.load_plm
and you should modify the code, and import it using import sys sys.path.insert(0, '/location_path/OpenPrompt')
— Reply to this email directly, view it on GitHub https://github.com/thunlp/OpenPrompt/issues/199#issuecomment-1488486380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP4V5FJ4I6HISD5EBOFLC6DW6QRHXANCNFSM6AAAAAARE4SUEE . You are receiving this because you commented.Message ID: @.***>
Hi,
So if you want a potential fix that goes around the "load_plm" function from OpenPrompt, you can load each component in separately and then merge:
Actually I have one thing you can try - it will avoid using OpenPrompts "load_plm" function. For instance, the SciBERT model should still work with OpenPrompts MLM tokenizer wrapper, so you can load the components in separately and them piece together.
from openprompt.plms.seq2seq import T5TokenizerWrapper, T5LMTokenizerWrapper
from openprompt.plms.lm import LMTokenizerWrapper
from openprompt.plms.mlm import MLMTokenizerWrapper
from transformers import T5Config, T5Tokenizer, T5ForConditionalGeneration
from transformers import AutoModelForCausalLM, AutoModelForSeq2SeqLM, AutoModelForMaskedLM, AutoTokenizer,
model_name = "your_mode_name_here"
plm = AutoModelForMaskedLM.from_pretrained(model_name)
WrapperClass = MLMTokenizerWrapper
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast = False)
Then you pass these to the prompt dataloader as you normally would. I do not have time right now to test this for the models outlined in this issue, but this has worked for me when using custom models. But SciBERT under the hood should potentially work directly with the OpenPrompt MLMTokenizerWrapper.
你好,来信我已收到,我会尽快处理,谢谢!
Hi
Thank you very much for your time and explanation.
On Tue, May 9, 2023 at 1:58 PM kinghmy @.***> wrote:
你好,来信我已收到,我会尽快处理,谢谢!
— Reply to this email directly, view it on GitHub https://github.com/thunlp/OpenPrompt/issues/199#issuecomment-1539896269, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP4V5FOCR46JC7Z37KX76XDXFIL4JANCNFSM6AAAAAARE4SUEE . You are receiving this because you commented.Message ID: @.***>
This is what I get when trying to load
xlm-roberta-base
Help is much appreciated, Thanks