nipunsadvilkar / pySBD

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.
MIT License
782 stars 82 forks source link

periods replacing characters #30

Closed dakinggg closed 4 years ago

dakinggg commented 4 years ago

A few characters and whitespaces have been replaced with ., e.g. time when -> ti.e.when

>>> segmenter.segment("Thus, we first compute EMC 3 's response time-i.e., the duration from the initial of a call (from/to a participant in the target region) to the time when the decision of task assignment is made; and then, based on the computed response time, we estimate EMC 3 maximum throughput [28]-i.e., the maximum number of mobile users allowed in the MCS system. EMC 3 algorithm is implemented with the Java SE platform and is running on a Java HotSpot(TM) 64-Bit Server VM; and the implementation details are given in Appendix, available in the online supplemental material.")
["Thus, we first compute EMC 3 's response ti.e.i.e., the duration from the initial of a call (from/to a participant in the target region) to the ti.e.when the decision of task assignment is made; and then, based on the computed response ti.e. we estimate EMC 3 maximum throughput [28]-i.e., the maximum number of mobi.e.users allowed in the MCS system.", 'EMC 3 algorithm is implemented with the Java SE platform and is running on a Java HotSpot(TM) 64-Bit Server VM; and the implementation details are gi.e. in Appendix, available in the onli.e.supplemental material.']
dakinggg commented 4 years ago

Note this is most problematic for me when a period replaces a space, because it becomes very difficult to detect that it happened/align realign the text to the original document