Open nikulk opened 8 years ago
I fixed this for my site using the following simplistic middleware:
import re
from django.core.urlresolvers import reverse
class RebaseLinks(object):
def process_response(self, proxy, request, upstream_response, response):
# change only html responses
if 'text/html' not in response['Content-Type']:
return response
# (?:<<regex>>) signifies non-capturing group
response.content = re.sub(r'''((?:src|href)=['|"]?)(/)''', r'\1'
+ reverse(request.resolver_match.url_name, kwargs={'url': ''}) + r'\2',
response.content, flags=re.IGNORECASE)
return response
This is very generic and could be dropped-in to any proxy that uses a prefix.
Sorry, I missed this when you posted it the other day!
Rewriting content is not intended to be the immediate responsibility of djproxy (or mod_proxy for that matter), but you're using exactly the functionality I added to help consumers achieve that goal. :+1: Content rewriting can be very tricky and it's more often than not application dependent (consider JSON or XML API responses, not just HTML), so it makes sense to leave that sort of detail up to consumers.
That said, I'm not particularly opposed to adding to the set of proxy middleware that djproxy ships with. I'll leave this open for visibility and I'll continue to consider generic URL rewriting solutions.
I am having the issue here. Most of the websites will load some of its contents from a relative path, whether it is ajax call or something.
I would try the solution posted by the asker, but I would also like to know if djproxy has added this functionality. :)
Actually my problem is a bit tricky. Some of the relative paths are generated by js, and there is no way I can use replaceAll
to simply replace the paths. Is it possible to make it work?
Rewriting content really isn't the responsibility of djproxy. The proxy middleware functionality discussed here is a mechanism that allows you to inject your own rewriting functionality as is needed for your specific purposes.
Here are some docs that might be useful:
https://github.com/thomasw/djproxy#proxy-middleware https://github.com/thomasw/djproxy/blob/master/djproxy/proxy_middleware.py#L32
The example middleware posted above will replace src and href attributes in html, but won't help with paths in JS assets that don't match that regex. It sounds like you've figured that all out on your own, but I wanted to make sure we're on the same page.
You said that your problem is with "relative paths". Is that really what you mean? Relative paths are really the ideal situation when you're proxying something. You should just need to make sure all of the paths for target assets are also proxied. That might mean using multiple proxies or even cleverly constructed URL patterns.
There are situations where things get trickier: For example, if you're proxying somedomain.tld/foo/bar
and that page has an asset reference like `src='../../assets/global.css'. If your proxy URL is the same depth as the target path, then you'd just need to proxy the referenced assets directory at a depth that matches the reference. Obviously though, that'd be a pretty brittle solution. It'd be better to proxy the top level directory of the target instead.
FWIW, you can also inject an HTML base tag to change the base url that assets are relative to, but I've never experimented with such an approach: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base
If you can post what your URL patterns are, what your proxy configuration looks like, and example asset references that are failing, I might be able to help more.
URL links in proxied documents that begin with '/' always request the document at the url of the website. For example, if my website is www.example.com, and the proxy for an external site www.external.com is hosted at '/prefix/', if the proxied HTML document that is served as a response contains any links (href or src) starting with '/', for example, '/image/puppy.png', the link resolves to 'www.example.com/image/puppy.png' which is incorrect, since it ignores the prefix we have set up for the proxy.