Use a pretrained model released by a paper in our lit review

mattesko / COMP550-Project

Fake news text classification project for the McGill COMP 550 Natural Language Processing course.

1 stars 1 forks source link

I had a look at the author's repo. The data set they use has this kind of corpus structure. Instead of

words words words...[punctuations] [symbols] other words

They built a hash table of index -> word, so all the corpora have the form

[integer index] [integer index for another word] etc etc

In order to reuse their code (even just the LSTM definition in torch), we need to completely process our data into the same form as that one. I don't think it's worth investing that much time and effort.

mattesko / COMP550-Project

Use a pretrained model released by a paper in our lit review #18