Closed ankitavyas closed 5 years ago
The model is trained to detect ending of senteces. The training data has mostly grammaticaly correct senteces which have atleast 3 4 words. The text you are passing doesn't look like general english senteces, but more like some domain specific shorthand. You will need to re-train or finetune on your data.
I am trying to test with my situation where I have lots of raw data with or without punctuation symbols. Couple of examples are below. First example has no punctuation and second has sentence separated by comma with spelling mistake.
When I run this statement through example code, I get no split at all.
It is likely your code may not expect raw statements as what I have. I don't have control on incoming data in raw format. I also receive this type of statements in 1000s so there is no way for manually modify each and every. Is there anything which I can do to make this work ?
DRIVE WITH EXCESS BLOOD ALCOHOL SPEED-EXCEED BY 15 KM/HR OR LESS FAIL TO SIGNAL DRIVE UNDER DISQUALIFICATION
Breach re 17/12/06 DRIVE WHILST AUTHORISATION SUSPENDED (2 CHARGES), EX PRESC CONC 3HRS-BREATH-DRIVER VECHICLE (3 CHARGES), DRIVE WHILST DISQUALIFIED