samdark / sitemap

Sitemap and sitemap index builder
BSD 3-Clause "New" or "Revised" License
534 stars 86 forks source link

Special Chars in URL #64

Closed nadar closed 4 years ago

nadar commented 5 years ago

I am not sure but this throws an exception because of special chars in url. But it seems that special chars are very common now (i just asked my self when it was the time this switched...)

The location must be a valid URL. You have specified: https://example.com/künstliche-intelligenz

File: samdark/sitemap/Sitemap.php Line: 243

(The original domain was: https://heartbeat.gmbh, which is a valid domain)

nadar commented 5 years ago

I just tested, the problem is öäü: https://3v4l.org/anvhr

samdark commented 5 years ago

According to specification, URLs should be encoded: https://www.sitemaps.org/protocol.html#escaping

samdark commented 5 years ago

We can either add URL encoding or improve error message.

nadar commented 5 years ago

Maybe encoding the url would make sense, even urls like http://test.com/jp/新 would fail i assume.

samdark commented 5 years ago

Yes. Any non-ASCII URL would not pass the check and having it not-encoded in a sitemap is against the spec.