viiviilxx / TextClassification

Pytorch implementation of CNN for Sentence Classification. And using BERT instead of word2vec for embedding words.
MIT License
8 stars 2 forks source link

bert fine-tuned #1

Open yassmine-lam opened 3 years ago

yassmine-lam commented 3 years ago

Hi,

Thank s for sharing ur code

The bert embeddings u used are fine-tuned or not ?

Thank u

viiviilxx commented 3 years ago

Thank you for watching my code.

I don't use fine-tuning. The bert part of the netwok. but, I throw gradient informations of bert embeddings. So, bert doesn't learn datasets.

Thank you.

yassmine-lam commented 3 years ago

hi,

thank u for the quick answer

Do u think fine-tuning BERT on the datasets will increase the performance of ur code?

because I ve read many blogs that said that fine-tuning bert is better than extracting features without fine-tuning they said that fine-tuning require less labeled data than a model built fron scratch

what do u think?

Thank u

viiviilxx commented 3 years ago

Sorry, I can't understand "they said that fine-tuning require less labeled data than a model built fron scratch" I think that fine-tuning of bert has two patterns.

  1. Only use bert. This is the pre-training. Bert learn representation of datasets. After that, learn the model with bert was learned.

  2. Use bert with other layers like my code. (My code is turn off the fine-tuning) This is not pre-training. Bert part of the model. Bert and other layers learn the datasets by gradient information at a same time. But, if model has too many layers, bert parameters are broken. (broken mean bert can't learn)

If I turn on the fine-tuning, select 1. And I think that increase the performance.

I am beginner. So, my thinking may are wrong. And I'm not good at English. So sorry.

Thank you.

yassmine-lam commented 3 years ago

ok I understand and thank u very much for taking time to answer

btw : I am a beginner and my english is not that good too :) I don t think this should be a problem, the more important is that we are trying to learn and share ideas with others :)

About the sentence u did not understand here is a blog https://pysnacks.com/machine-learning/bert-text-classification-with-fine-tuning/

in which different methods of using bert are discussed and the author said that fine-tuning is recommended and gave some reasons including the need for less labeled data for learning parameters compared to a model built from scratch because I am trying to learn nlp with deep learning and I have a small dataset This is why I am trying to find the best way to overcome this problem and achieve good results

Thank s again