Now that we have our preprocessing (lemmatization, punctuation removal, etc) complete, we need to preprocess all of our input data. The code for this is simply data['reviewText'] = data['reviewText'].apply(<preprocessor_function>), however, this takes a lot of memory and a long time, so it cannot be safely done in Colab, which might timeout before it is complete. This will probably need to be done locally, and the results can be saved and uploaded (data.to_pickle?).
Now that we have our preprocessing (lemmatization, punctuation removal, etc) complete, we need to preprocess all of our input data. The code for this is simply
data['reviewText'] = data['reviewText'].apply(<preprocessor_function>)
, however, this takes a lot of memory and a long time, so it cannot be safely done in Colab, which might timeout before it is complete. This will probably need to be done locally, and the results can be saved and uploaded (data.to_pickle
?).