mattfredericksen / CSCE-4205-ML-Project

1 stars 0 forks source link

Create a completely preprocessed document file #4

Open mattfredericksen opened 3 years ago

mattfredericksen commented 3 years ago

Now that we have our preprocessing (lemmatization, punctuation removal, etc) complete, we need to preprocess all of our input data. The code for this is simply data['reviewText'] = data['reviewText'].apply(<preprocessor_function>), however, this takes a lot of memory and a long time, so it cannot be safely done in Colab, which might timeout before it is complete. This will probably need to be done locally, and the results can be saved and uploaded (data.to_pickle?).

mattfredericksen commented 3 years ago

This might work. I don't know how long it will take, but the segmentation should solve the memory issue.