Closed wenestam closed 3 years ago
Hi, dunno, sorry. Our thinking about this is that It is legendary. Supreme. Machine.
are three separate sentences so we're going to keep it that way.
IIRC NLTK's tokenizers might be able to do what you want to do: https://www.nltk.org/api/nltk.tokenize.html#module-nltk.tokenize.treebank, but I might be wrong.
Thank you so much for the quick response :) Have a nice weekend
Hi!
I'm currently working with news article data and I'm using your amazing sentence splitter. I have a issue of not wanting to split sentences that are inside of quotation(s). Eg.
splitter.split(""" "I hereby present you the new machine. 'It is legendary. Supreme. Machine.' the daily news said." """)
becomes: '"I hereby present you the new machine.', "'It is legendary.", 'Supreme.', 'Machine.\' the daily news said.".'
and I need it to become: "I hereby present you the new machine. 'It is legendary. Supreme. Machine.' the daily news said."
So that it keeps everything inside the quotation intact. I have been trying with different regex patterns with no success. Also I have tried to mix with the source code of the sentence-splitter.
Do you have any tips or tricks? Is there anything I can change in the package code to make it ignore splitting sentences inside quotations?
Thanks in advance