Can not train domain-adapter with unstructured knowledge

xu1868 / Mixture-of-Domain-Adapters

Codebase for ACL 2023 paper "Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models' Memories"

MIT License

44 stars 1 forks source link

Can not train domain-adapter with unstructured knowledge #6

Closed nguyentuc closed 7 months ago

nguyentuc commented 11 months ago

Thank you for opening the source of implementation. I am trying to train new domain adapters with different unstructured knowledge. However, I cannot run Stage 1 in your source of implementation following your instructions. I am running on pytorch_lightning=1.9.0 and python=3.10. The bug is: "encodings = self._tokenizer.encode_batch( TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]"

Screenshot from 2023-09-15 14-03-48 Can you please provide the libraries's version?

xu1868 commented 11 months ago

Hi @nguyentuc ,

Thanks for pointing out this. For some unknown reasons it looks like the current tokenizers version is preventing the program from running. We have added a new requirements.txt, and you can try using the versions inside which should resolve the issue.

nguyentuc commented 11 months ago

Hi @Amano-Aki , I tried multiple times with your provided libraries but it led me to new problems. I fixed and came up with these libraries and their versions. I put it there and hope it helps other people if they want to run the code. Thank you so much for your help. Many thanks.

pytorch-lightning==1.9.0
pytorch==1.12.1
torchvision==0.13.1 
torchaudio==0.12.1
apache-beam==2.42.0
chardet
cchardet
transformers==4.33.1
adapter-transformers==3.1.0
opendelta==0.3.1
scipy==1.9.3
scikit-learn==1.1.2
spacy==3.4.2
huggingface-hub==0.17.1
accelerate==0.13.2
accelerator==2022.8.4.dev1
datasets==2.14.0

nguyentuc commented 11 months ago

Hi @Amano-Aki , When I do stage 2 training, it shows "cannot import name 'HoulsbyConfig' from 'transformers'". Can you check it again? Thank you

txchen-USTC commented 9 months ago

Hi @Amano-Aki , I tried multiple times with your provided libraries but it led me to new problems. I fixed and came up with these libraries and their versions. I put it there and hope it helps other people if they want to run the code. Thank you so much for your help. Many thanks.
pytorch-lightning==1.9.0
pytorch==1.12.1
torchvision==0.13.1 
torchaudio==0.12.1
apache-beam==2.42.0
chardet
cchardet
transformers==4.33.1
adapter-transformers==3.1.0
opendelta==0.3.1
scipy==1.9.3
scikit-learn==1.1.2
spacy==3.4.2
huggingface-hub==0.17.1
accelerate==0.13.2
accelerator==2022.8.4.dev1
datasets==2.14.0 

Hi, have you changed the group_texts function in mixda_stage_one_mlm.py? the length of self.sample_dataset turns to 2 after going through this function, have you met this issue and how you fix it? Thanks