sergioburdisso / pyss3

A Python package implementing a new interpretable machine learning model for text classification (with visualization tools for Explainable AI :octocat:)
https://pyss3.readthedocs.io
MIT License
333 stars 44 forks source link

Divison by 0 #10

Closed demcbs closed 4 years ago

demcbs commented 4 years ago

I am eager to use the SS3 classifier for text classification task in my master's thesis. Unfortunately when I run it I get a division by zero error message, see image. My text seems fairly clean (although not yet cleaned exactly the right way) to me, so I am not sure what is causing this.

Is there anything you suspect might be going wrong which I could try? Or anywhere where the data criteria are listed (I've looked but maybe I've overlooked)?

I included the data structure (pandas series), some of what my data looks like and the error.

Many thanks! image

image

image

IveJ commented 4 years ago

Hi Demcbs,

How about your n-gram setting!? So zero in result's image.

You can check in ss3 class with doc for params.

Nice.

On Fri, Jun 19, 2020, 11:31 PM demcbs notifications@github.com wrote:

I am eager to use the SS3 classifier for text classification task in my master's thesis. Unfortunately when I run it I get a division by zero error message, see image. My text seems fairly clean (although not yet cleaned exactly the right way) to me, so I am not sure what is causing this.

Is there anything you suspect might be going wrong which I could try? Or anywhere where the data criteria are listed (I've looked but maybe I've overlooked)?

I included the data structure (pandas series), some of what my data looks like and the error.

Many thanks! [image: image] https://user-images.githubusercontent.com/46752691/85156259-1bb6ff80-b25a-11ea-86b1-08902c21bfd6.png

[image: image] https://user-images.githubusercontent.com/46752691/85156275-22457700-b25a-11ea-8ace-fdf03beee759.png

[image: image] https://user-images.githubusercontent.com/46752691/85156305-2e313900-b25a-11ea-8deb-a971317c7786.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sergioburdisso/pyss3/issues/10, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYAMLY5UPPAET7IGTOZOVTRXOHFNANCNFSM4OC34VTQ .

demcbs commented 4 years ago

I created no n-grams, I put in the unstructured text, so that's the default n-gram size, I suppose.

sergioburdisso commented 4 years ago

First of all, thanks for creating this issue and reporting this bug, @demcbs. Also, thank @IveJ for your comments :+1:

I've found what the problem was, First, I created a small script to replicate the error, the smallest script that I could come up with was:

    test_x = ["this is the first document", "this is the second document"]
    test_y = [0, 1]
    clf = SS3()
    clf.train(test_x, test_y)

I found out that the problem was caused by the integer "0" label, it was triggering a condition as True when it shouldn't. (more details are given in the commit message 236a942). I've already released a new version (0.6.2) with the patch fixing this issue, so updating the package (pip install -U pyss3) should solve the problem :blush:

demcbs commented 4 years ago

Thanks for the quick fix Sergio!