matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.87k stars 349 forks source link

Missing port for local testing: <localhost:port> #251

Closed fartbagxp closed 7 years ago

fartbagxp commented 7 years ago

Subject of the issue

The a@hef returned does not contain a port as part of the link.

Describe your issue here. I ran a scraper for a locally hosted website, on a particular port , and I tried to scrape the a@href for the link, the link will come back as localhost/<page>.html instead of localhost:5838/<page>.html

Your environment

Steps to reproduce

Tell us how to reproduce this issue.

Expected behaviour

Actual behaviour

Purposed Code Changes

I'm not sure if it's justified to put in a port, and I'm not sure what damage it'll cause, but I was mucking with the code a little bit, and I found that if I simply append the port in the absolutes.js file, and simply do something like this...

 function absolute (path, $) {
   var parts = url.parse(path)
-  var remote = parts.protocol + '//' + parts.hostname
+  var remote = parts.protocol + '//' + parts.hostname + ':' + parts.port;

^ this will work for me, but it seems arrogant for me to assume that it won't break anything else.

fartbagxp commented 7 years ago

Apparently this is already covered by https://github.com/lapwinglabs/x-ray/pull/221. Closing as duplicate.