summanlp / textrank

TextRank implementation for Python 3.
https://pypi.org/project/summa/
MIT License
1.25k stars 261 forks source link

Sentence tokenizer not working on Full stop #76

Open shivambatra76 opened 4 years ago

shivambatra76 commented 4 years ago

I have given the following input to

from summa.preprocessing.textcleaner import clean_text_by_sentences as _clean_text_by_sentences.

text='''Ad sales boost Time Warner profit Quarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (£600m) for the three months to December, from $639m year-earlier.The firm, which is now one of the biggest investors in Google, benefited from sales of high-speed internet connections and higher advert sales. TimeWarner said fourth quarter sales rose 2% to $11.1bn from $10.9bn. Its profits were buoyed by one-off gains which offset a profit dip at Warner Bros, and less users for AOL. ''' This is the output i have recieved from after preprocessing. As you can see the second sentence should get separated by full stop but instead it is only separating the sentence using space on a new line by enter key pressed. Screenshot (28)

[Original unit: 'Ad sales boost Time Warner profit' --- Processed unit: 'ad sale boost time warner profit', Original unit: 'Quarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (£600m) for the three months to December, from $639m year-earlier.The firm, which is now one of the biggest investors in Google, benefited from sales of high-speed internet connections and higher advert sales.' --- Processed unit: 'quarter profit media giant timewarn jump bn £m month decemb m year earlier firm biggest investor googl benefit sale high speed internet connect higher advert sale', Original unit: 'TimeWarner said fourth quarter sales rose 2% to $11.1bn from $10.9bn.' --- Processed unit: 'timewarn said fourth quarter sale rose bn bn', Original unit: 'Its profits were buoyed by one-off gains which offset a profit dip at Warner Bros, and less users for AOL.' --- Processed unit: 'profit buoy gain offset profit dip warner bros user aol']