rennat / pynliner

Python CSS-to-inline-styles conversion tool for HTML using BeautifulSoup and cssutils
http://pythonhosted.org/pynliner/
180 stars 93 forks source link

Pynliner Determines Absolute URLs by checking for http:// at the start, ignores https and protocol agnostic //server.com #26

Closed kevinastone closed 10 years ago

kevinastone commented 10 years ago

Described it mostly in the title. The call to _get_external_styles determines if a url is absolute by checking for http:// at the start. It really should use urlparse and fill in the missing components (scheme, host, path, etc).

https://github.com/rennat/pynliner/blob/master/pynliner/__init__.py#L163

rennat commented 10 years ago

Nice catch! I'll happily accept a pull request with this kind of improvement if you want to be immortalized with a commit here or I can implement it.

We need tests to confirm the urls are interpreted correctly. Test case one can just confirm that the different types of href values are all transformed to absolute URLs correctly.

This has me thinking about is that full support for HTTPS remote stylesheets may require credentials in some cases. That is something that hasn't been thought of so far and will require changes in the _get_url method. Since the time that pynliner was originally developed the requests package has become popular for good reason. I think a good approach to address full HTTPS support would be to configure pynliner at initialization with credentials for any required domains and switch the _get_url method to use the requests package to fulfill the requests using the configured credentials if necessary.

This is a bigger feature that will require more testing than I can promise to have time for in a few days so a first step would be to support anonymous HTTPS requests for now and include full support in a larger release.

On Mon, Sep 30, 2013 at 9:39 PM, Kevin Stone notifications@github.comwrote:

Described it mostly in the title. The call to _get_external_stylesdetermines if a url is absolute by checking for http:// at the start. It really should use urlparse and fill in the missing components (scheme, host, path, etc).

https://github.com/rennat/pynliner/blob/master/pynliner/__init__.py#L163

— Reply to this email directly or view it on GitHubhttps://github.com/rennat/pynliner/issues/26 .

kevinastone commented 10 years ago

Yeah, to be honest, I think you can get all the way with just a urlparse.urljoin by constructing a base url using root_url and relative_url. I'll send you a pull-request with the update and some tests to validate different url patterns.

I think whether to validate SSL credentials is a bit complimentary. Some users may want that validation, in my case, it's my own endpoint, so it's implicitly trusted within my network.

kevinastone commented 10 years ago

Btw, unrelated, but test_unknown_pseudo_selector fails in my test environment. I sent a separate PR for some duplicate test names I discovered.

kevinastone commented 10 years ago

Status?

rennat commented 10 years ago

Looks good I'll be merging it in shortly