jekyll / jekyll-import

:inbox_tray: The "jekyll import" command for importing from various blogs to Jekyll format.
https://import.jekyllrb.com
MIT License
513 stars 315 forks source link

bad URI on Tumblr URLs with Umlauts #216

Closed stefanw closed 8 years ago

stefanw commented 9 years ago

Tumblr puts umlauts in URLs, e.g like this: http://blog.fragdenstaat.de/post/127162614837/nrw-gefängnis-muss-ehemaligem-strafgefangenem

Importing URLs with umlauts results in this exception:

ruby/2.1.0/uri/common.rb:176:in `split': bad URI(is not URI?): http://blog.fragdenstaat.de/post/125835141112/baden-württemberg-informationsfreiheitsgesetz (URI::InvalidURIError)
    from ruby/2.1.0/uri/common.rb:211:in `parse'
    from ruby/2.1.0/uri/common.rb:747:in `parse'
    from jekyll-import-0.7.1/lib/jekyll-import/importers/tumblr.rb:208:in `block in rewrite_urls_and_redirects'
    from jekyll-import-0.7.1/lib/jekyll-import/importers/tumblr.rb:204:in `map'
    from jekyll-import-0.7.1/lib/jekyll-import/importers/tumblr.rb:204:in `rewrite_urls_and_redirects'
    from jekyll-import-0.7.1/lib/jekyll-import/importers/tumblr.rb:61:in `process'
    from jekyll-import-0.7.1/lib/jekyll-import/importer.rb:23:in `run'
    from -e:2:in `<main>'

This SO thread advises to encode the URL before parsing, so maybe that could work here as well.

parkr commented 9 years ago

Sounds good to me! Would you submit a PR?