notAI-tech / deepsegment

A sentence segmenter that actually works!
http://bpraneeth.com/projects
GNU General Public License v3.0
302 stars 56 forks source link

Issue with other statement #15

Closed ankitavyas closed 5 years ago

ankitavyas commented 5 years ago

I am trying to test with my situation where I have lots of raw data with or without punctuation symbols. Couple of examples are below. First example has no punctuation and second has sentence separated by comma with spelling mistake.

When I run this statement through example code, I get no split at all.

It is likely your code may not expect raw statements as what I have. I don't have control on incoming data in raw format. I also receive this type of statements in 1000s so there is no way for manually modify each and every. Is there anything which I can do to make this work ?

DRIVE WITH EXCESS BLOOD ALCOHOL SPEED-EXCEED BY 15 KM/HR OR LESS FAIL TO SIGNAL DRIVE UNDER DISQUALIFICATION

Breach re 17/12/06 DRIVE WHILST AUTHORISATION SUSPENDED (2 CHARGES), EX PRESC CONC 3HRS-BREATH-DRIVER VECHICLE (3 CHARGES), DRIVE WHILST DISQUALIFIED

image

bedapudi6788 commented 5 years ago

The model is trained to detect ending of senteces. The training data has mostly grammaticaly correct senteces which have atleast 3 4 words. The text you are passing doesn't look like general english senteces, but more like some domain specific shorthand. You will need to re-train or finetune on your data.