Any tutorials on applying BERT on some task such as WebQA data?

kpe / bert-for-tf2

A Keras TensorFlow 2.0 implementation of BERT, ALBERT and adapter-BERT.

https://github.com/kpe/bert-for-tf2

MIT License

802 stars 193 forks source link

Any tutorials on applying BERT on some task such as WebQA data? #10

Open lucasjinreal opened 4 years ago

lucasjinreal commented 4 years ago

Hi, it's that possible applying BERT model on data such as WebQA in Chinese?

kpe commented 4 years ago

I haven't look at the WebQA yet (do you have a download link for the dataset?), but I'm planning to commit a notebook on how to use bert-for-tf2 on SQuAD soon.

lucasjinreal commented 4 years ago

WebQA released by Baidu which is in Chinese and contains quit amount of knowledge with accurate answer similar with SQuad.

I think the official download link is down. I can offer a backup link for you if you wanna play with it.

https://drive.google.com/open?id=1r5lNotgWvTw58Y9s3Xodisiw8TuKecRn

lucasjinreal commented 4 years ago

I wanna saw kinda knowledge answer machine example which I believe it can be very fun such as:

Q: 中国的首都是哪里？
A: 北京

Q:中国的首都是在哪个地方？
A: 北京

Q: 中国的首都是哪个城市？
A: 北京

kpe commented 4 years ago

Thank you @jinfagang for the link! I'll have to take a deeper look into it; currently I would say, it seems quite similar to SQuAD, i.e. given a question and a context (called evidence in the WebQA paper), predict a sequence tagging/labeling (i.e. where does the answer occur in the context sequence). So if I understand correctly, in your example from above, you need to feed both a question and an evidence/context like (北京是中国的首都) to predict the answer(北京).

lucasjinreal commented 4 years ago

Yes for the conclusion but no, I think input can only be the question. Just using BERT pretrained model to make input sentence to a dense matrix then applying this as an input of a model (this model learns the knowledge from data). correct me if I am wrong. the main insight is that using BERT to learn similar question station form which targets to a same answer.

However this maybe need another hard work on building such a model which learns on knowledge data.

I believe BERT able to do this quit well.