Closed demcbs closed 4 years ago
Hi Demcbs,
How about your n-gram setting!? So zero in result's image.
You can check in ss3 class with doc for params.
Nice.
On Fri, Jun 19, 2020, 11:31 PM demcbs notifications@github.com wrote:
I am eager to use the SS3 classifier for text classification task in my master's thesis. Unfortunately when I run it I get a division by zero error message, see image. My text seems fairly clean (although not yet cleaned exactly the right way) to me, so I am not sure what is causing this.
Is there anything you suspect might be going wrong which I could try? Or anywhere where the data criteria are listed (I've looked but maybe I've overlooked)?
I included the data structure (pandas series), some of what my data looks like and the error.
Many thanks! [image: image] https://user-images.githubusercontent.com/46752691/85156259-1bb6ff80-b25a-11ea-86b1-08902c21bfd6.png
[image: image] https://user-images.githubusercontent.com/46752691/85156275-22457700-b25a-11ea-8ace-fdf03beee759.png
[image: image] https://user-images.githubusercontent.com/46752691/85156305-2e313900-b25a-11ea-8deb-a971317c7786.png
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sergioburdisso/pyss3/issues/10, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYAMLY5UPPAET7IGTOZOVTRXOHFNANCNFSM4OC34VTQ .
I created no n-grams, I put in the unstructured text, so that's the default n-gram size, I suppose.
First of all, thanks for creating this issue and reporting this bug, @demcbs. Also, thank @IveJ for your comments :+1:
I've found what the problem was, First, I created a small script to replicate the error, the smallest script that I could come up with was:
test_x = ["this is the first document", "this is the second document"]
test_y = [0, 1]
clf = SS3()
clf.train(test_x, test_y)
I found out that the problem was caused by the integer "0" label, it was triggering a condition as True when it shouldn't. (more details are given in the commit message 236a942). I've already released a new version (0.6.2) with the patch fixing this issue, so updating the package (pip install -U pyss3
) should solve the problem :blush:
Thanks for the quick fix Sergio!
I am eager to use the SS3 classifier for text classification task in my master's thesis. Unfortunately when I run it I get a division by zero error message, see image. My text seems fairly clean (although not yet cleaned exactly the right way) to me, so I am not sure what is causing this.
Is there anything you suspect might be going wrong which I could try? Or anywhere where the data criteria are listed (I've looked but maybe I've overlooked)?
I included the data structure (pandas series), some of what my data looks like and the error.
Many thanks!