Dataset during pretrained.

kevinscaria / InstructABSA

Instructional learning for Aspect Based Sentiment Analysis [NAACL-2024]

https://aclanthology.org/2024.naacl-short.63/

MIT License

147 stars 24 forks source link

Dataset during pretrained. #9

Closed twotwoiscute closed 1 year ago

twotwoiscute commented 1 year ago

Thanks for great work, I wonder is there other dataset that is not english-based you used during training?

SupritYoung commented 1 year ago

obviously not

twotwoiscute commented 1 year ago

Thanks for the reply,Can you tell me the pipeline if I would like to train the model with Chinese dataset(suppose the data is already labeled).

SupritYoung commented 1 year ago

I think just modify the tokenizer and instructions is OK, but my concern is that I can't reproduce result. Did you have this question ?

twotwoiscute commented 1 year ago

I think the reason you can not reproduce the result is that the pretrained model you used is only trained on english only], there's another pretrained model trained on multiple language.

SupritYoung commented 1 year ago

But I dont notice that in his paper 😂, I'm also insterest in this model, would you mind leave me your wechat ? I look forward to further communication with you. ❤️ my wechat: SupritYoung email: suprit@foxmail.com

twotwoiscute commented 1 year ago

But I dont notice that in his paper joy, I'm also insterest in this model, would you mind leave me your wechat ? I look forward to further communication with you. heart my wechat: SupritYoung email: suprit@foxmail.com

Sure, emm.. just so you know I used to study computer vision, just started to working on NLP.

kevinscaria commented 1 year ago

tk-instruct is finetuned on instructions based on the T5 model which is just in English. However, you can use the one @SupritYoung posted. There are two variants of mtk which is the 3B and 11B variant. However, 3B and 11B would require significant compute requirements. You can also try finetuning using the mt5 base as well. But this would require instruction tuning that will have to be done additionally.