vezaynk / Sitemap-Generator-Crawler

PHP script to recursively crawl websites and generate a sitemap. Zero dependencies.
https://www.bbss.dev
MIT License
241 stars 92 forks source link

allow to validate non ascii urls (fixes #57) #58

Open francisek opened 6 years ago

francisek commented 6 years ago

Add function url_to_ascii that converts utf8 url to plain ascii one.

vezaynk commented 6 years ago

This covers domains but not URL paths.

Half way there.

vezaynk commented 6 years ago

I'm trying to test this but idn_to_ascii isn't on my system.. I have pretty standard install and the function is visible on the official docs without the need for any extensions. I have a fairly standard install too so this is strange.

This cannot be merged without a shim.

francisek commented 6 years ago

idn_to_ascii is related to the intl extension. We could use pure php shim with an external library like https://github.com/phlylabs/idna-convert as the converstion rules are a bit complicated.