Custom preprocessing in Live Test

sergioburdisso / pyss3

A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI :octocat:)

https://pyss3.readthedocs.io

MIT License

333 stars 44 forks source link

Custom preprocessing in Live Test #3

Closed saurabhsbora closed 4 years ago

saurabhsbora commented 4 years ago

@sergioburdisso It would be a great feature to have custom preprocessing in the Live Test. This will enable us to visually understand the words, sentences, and paragraphs that helped the model to classify a particular document after custom preprocessing.

sergioburdisso commented 4 years ago

Hi @enthussb! Sure, I overlooked this option when first coding the Live Test tool. Thank you for your suggestion :)

I've added this feature in the new version, and also took the opportunity to incorporate some other things that were pending, namely, what's new on this version is:

The Live Test Tool now supports custom (user-defined) preprocessing methods (b50cfaf, resolved #3).
The tokenization process was improved (26fff88, 4af8e80).
The process for recognizing word n-grams during classification was improved (2ceb148).

Update your package version using the pip install -U pyss3 to the new version (0.5.8). To make things easier for you, I've created a new Jupyter Notebook in the examples folder in which is shown how to work incorporate user-defined preprocessing functions to the Live Test tool visualizations for you to follow if you want: using_custom_preprocessing.ipynb

Let me know if everything worked OK :coffee:

sergioburdisso commented 4 years ago

@all-contributors would you add @enthussb for ideas to the README file? it helped to make this project better by suggesting this cool feature :+1:

allcontributors[bot] commented 4 years ago

@sergioburdisso

I've put up a pull request to add @enthussb! :tada:

saurabhsbora commented 4 years ago

@sergioburdisso I updated the package and ran the code. Everything is working fine, although my accuracy has been reduced quite a bit. I guess it might be due to the latest n-gram and tokenization changes. Could you please have a look at that?

sergioburdisso commented 4 years ago

I was about to tell you to perform a hyperparameter optimization using the Evaluation.grid_search() function but then I realized that I didn't include a "prep" argument to disable the default preprocessing. As a consequence, users won't be able to perform any hyperparameter optimization using only their custom preprocessing method. I'll work on that and add the "prep" parameter to the grid_search(), test, and kfold_cross_validation functions of the Evaluation class. I'm sorry for forgetting to add this in the first place :cry:. I'll notify you as soon as the new version is released.

saurabhsbora commented 4 years ago

Okay no problem 👍, till then I can work on the previous version where I had achieved great accuracy!

sergioburdisso commented 4 years ago

I've just finished making those changes and released the new version (0.5.9). I've also updated the notebook adding a section for "Hyperparameter Optimization". Try performing hyperparameter optimization similar to what I did in that notebook and let me know. In case you are still getting bad accuracy, please share some more details, like part of the actual code, the actual accuracy before and after the changes, etc. It would be much easier to try to help that way. I hope you achieve great accuracy again :cry: :crossed_fingers: :four_leaf_clover:

saurabhsbora commented 4 years ago

Sure I will check the updated package and revert.