Closed jhs-s closed 7 years ago
how did you fix this kind of issue? I also experiencing like this when my url is https.
Will look into it the next days. The port can be set to 443 by default if https is part of the provided URL.
The first case should work. If a redirect to https
is set up it should redirect and proceed crawling. But will look into this as well.
@ayenzky This works:
var generator = new SitemapGenerator("https://example.com", {port: 443});
I had a quick look at the code and it seems that the robots parser is not provided with port 443 in case the protocol is https by default but I didn't have a deeper look as providing the port worked.
Fixed in 97a4622. Port will be set accordingly if https is provided. Keeps respecting user provided port.
Please check if this is solving your problem.
@lgraubner, @jhs-s
Thanks for the immediate response.
I tested the new update and this is the output I get
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://lanarkshirechamber.co.uk:443/</loc>
</url>
</urlset>
You are right. As I can see the crawler adds any port but port 80 to the URL. As this is a problem with the robots parser I changed this only for this part. I published version 4.1.1, please test again. Thanks!
@lgraubner
It still the same but sometimes It return "null"
Are you sure you have the latest version? sitemap-generator -V
should return 4.1.1. The port problem should be fixed. On some sites with https it returns null
as the crawler can't find the site. See #4. I'm not quite sure whats the problem as on some sites it works fine. Maybe the underlying package simple-crawler
has problems with some certificates.
@lgraubner
Yes, I have the latest version..Ok so maybe the problem is on the simplecrawler
package. Anyway thanks for the help..:)
This doesn't work (adds :80 to the request host)
var generator = new SitemapGenerator("https://forum.colemak.com");
This works:
var generator = new SitemapGenerator("https://forum.colemak.com", {port: 443});
I guess on my case it doesn't work since I'm using a force ssl, http redirected to https.
There were some problems with specifying correct ports. Version 5 fixed this. Make sure to specify port directly in the URL and not as option anymore.
The robots.txt parser doesn't work in these cases:
It works if you provide https://example.com and Port 443. I think this should be added to the docs or fixed otherwise.
Thanks!