Problem in predict.ipynb

jkwieser / personality-prediction-from-text

Predicting big five personality traits from a given text.

MIT License

150 stars 48 forks source link

Problem in predict.ipynb #3

Open soumitra9 opened 3 years ago

soumitra9 commented 3 years ago

Wouldn't your text(test set) need to go through the same pre-processing as your training set? I fail to understand why are you splitting the text using regex to split and returning the prediction of the first split sentence i.e EXT[0]?

hariravi commented 3 years ago

Hi all, following up on this many months later, I agree with this assessment, you have to do the same preprocessing (initially with the emotional words, etc., and also adjust preprocessing dependent on BERT vs bow) - you are doing the vectorization, but I think it has to be done on the text on aggregate, as the bag of word models were trained on essays as a whole.