santhoshse7en / news-fetch

A Python Package which helps to scrape all news details from any news websites
MIT License
180 stars 109 forks source link

"Special letters" are being converted to regular ones #92

Open jkreuz opened 3 years ago

jkreuz commented 3 years ago

Hello

Is it possible in some way to define what language the news is in, so it could be fetched correctly? I used the library for a news in Portuguese, but it converted "special letters" to regular ones. It highly compromises NLP procedures that deals with syntax, context etc.

example: "àáéóíúâôêãõç" is converted to "aaeiuaoeaoc"

from newsfetch.news import newspaper news = newspaper('https://g1.globo.com/sc/santa-catarina/noticia/2021/01/20/greve-na-comcap-coleta-feita-por-empresa-privada-em-florianopolis-vai-abranger-35percent-do-roteiro-diz-prefeitura.ghtml')

I saw inside the class it is used Newspaper3K Scraper and if I enforce the right language it returns the correct text.

from newspaper import Article article = Article(url, language='pt')

thank you