Open soumitra9 opened 3 years ago
Hi all, following up on this many months later, I agree with this assessment, you have to do the same preprocessing (initially with the emotional words, etc., and also adjust preprocessing dependent on BERT vs bow) - you are doing the vectorization, but I think it has to be done on the text on aggregate, as the bag of word models were trained on essays as a whole.
Wouldn't your text(test set) need to go through the same pre-processing as your training set? I fail to understand why are you splitting the text using regex to split and returning the prediction of the first split sentence i.e EXT[0]?