nfriedly / node-unblocker

Web proxy for evading internet censorship, and general-purpose Node.js library for proxying and rewriting remote webpages
https://www.npmjs.com/package/unblocker
GNU Affero General Public License v3.0
450 stars 866 forks source link

Disable relative paths #111

Open gldanoob opened 5 years ago

gldanoob commented 5 years ago

The proxy works really fast and well, but there's a bug caused by the relative path system. Sometimes pages are not loading when I click on links on the unblocked site. Also, if I go to the previous page, it would show a not found error as well. Is there an option that we can disable relative path system and use the absolute path all the time?

nfriedly commented 5 years ago

I'm sure it's possible, although it'd probably be better to fix the bug. Can you give me an example of where it breaks?

gldanoob commented 5 years ago

Unblocking sites like YouTube will sometimes break, and I think it's somehow because of the relative path

nfriedly commented 5 years ago

Youtube in particular is tricky, they do all kinds of weird stuff in javascript, which makes it too late for the proxy to correct it.

If you want to try adding support for re-writing relative paths, to see if it helps, though, you'll want to edit https://github.com/nfriedly/node-unblocker/blob/master/lib/url-prefixer.js and https://github.com/nfriedly/node-unblocker/blob/master/test/urlprefixer_spec.js

You can base your addition off of the re_rel_root regex and just add in the directory portion of the current path. Actually, I forgot about url.resolve - that's a better option than blind string concatenation: https://nodejs.org/api/url.html#url_url_resolve_from_to

gldanoob commented 5 years ago

Can you please give me the modified code of those two JS files? I'm not really clear that what should I modify

nfriedly commented 5 years ago

Not really, at least not very soon. Mainly because I have a newborn baby and very little time, but also because I don't feel like I really understand what the issue is, how to reproduce it, or how this could possibly fix it.

If you want me to work on it, then at a minimum I'd need clear steps to reproduce the issue. Something like "go to this URL, click link X, observe that the address bar is Y when it should be Z".

That doesn't guarantee that I can fix it, but it at gets me on the same page as you in understanding the issue.

However, that said, this is a good ticket to dip your toes in: small, focused, and you already understand the problem you're trying to solve ;)

noahcoetsee commented 5 years ago

I would recommend just disabling relative paths altogether. Most proxies do that and it seems to work better/more efficiently.

Perhaps something like AJAX and PHP to pass the proper url (even allow for encoding).

nfriedly commented 5 years ago

"Disable" isn't really the right word, because there's literally no code that touches relative paths. Additional code could be written that would re-write relative paths to force them to go through the proxy, which is what I suggested above.

The current implementation is more efficient (doing nothing usually is ;) but it's feasible additional work here could improve it's compatibility. I don't see exactly how, but I'm not going to rule it out either.

If you want to write up an AJAX and PHP system to check every URL, that's awesome! I'll try and give you some pointers for getting started. But I don't think I have the time to do much more than that right now.

This project is something that I built years ago and put out there for people to use for free. I'm glad folks are still using it and seeing benefits, but I can't commit to any significant development work on it right now.

I have offered paid support to businesses that were using this in the past, but right now I don't even have the time available to support that.