dont url - Githubissues

lipoja / URLExtract

URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.

MIT License

242 stars 61 forks source link

dont url #97

Closed NeilRiver closed 2 years ago

NeilRiver commented 2 years ago

https://sportgyms.ru/

Обратная связь

is not pls fix

NeilRiver commented 2 years ago

<a href="/index.php?do=feedback">Обратная связь</a>

lipoja commented 2 years ago

<a href="/index.php?do=feedback">Обратная связь</a>

I can not support this. Because the example above is not URL. It is HTML anchor (link). Which can me extracted from HTML site by BeautifullSoup or any other XML parser.