nipunsadvilkar / pySBD

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
MIT License
813 stars 84 forks source link

Carriage return fix #54

Closed dakinggg closed 4 years ago

dakinggg commented 5 years ago

Had a go at fixing the offsets, take a look and see what you think. I built on your WIP branch so this has those commits as well. All the tests pass (although I am wondering why you xfailed the Alice in Wonderland test in your second commit? Is something broken there?)

nipunsadvilkar commented 5 years ago

Thanks..looks good so far.

Yes, something broke for that the Alice in Wonderland test. It seems there is issues with a newline character offsets for 2nd index in expected list

dakinggg commented 5 years ago

I haven't looked at the cleaner code, but it seems like something about how ❦ is used is wrong when it follows another punctuation. I think something also might still be broken for char_span=True when there are double new lines. but I'll leave it to you whether those things need fixing before/after merging this pr

nipunsadvilkar commented 4 years ago

@danielkingai2 Thanks for your contribution😃

Opted for another approach. You can have a look at PR #63