persunde / wdg-projects

A list of /wdg/ personal projects https://wdg.one/
GNU General Public License v3.0
34 stars 4 forks source link

Crawler - URL's not being picked up correctly #18

Closed ghost-wdg closed 3 years ago

ghost-wdg commented 3 years ago

URL being chopped off, regex issue?

Example (see Repo)

Developer: KingOfCaves
Repo: https://github.com/KingOfCaves/foun
Tools: node, react, express, icecast, liquidsoap
Link: https://fountainofdreams.net
persunde commented 3 years ago

Thanks for catching. Apparently if the line from a post is long enough, then it will contain a <wbr> element (Word Break Opportunity). That element messes with the scraper as it process it like it is a newline, while it should ignore it. You can see the example text here. The <wbr> element messes up the scraper/crawler. I will fix it soon.

repo:: https://github.com/KingOfCaves/foun<wbr>tain-of-dreams<br>
persunde commented 3 years ago

See commit for more info on where to solve the issue: a9cd3e6857e66c5570243002d38dc3446cc85ea6