janreges / siteone-crawler

SiteOne Crawler is a cross-platform website crawler and analyzer for SEO, security, accessibility, and performance optimization—ideal for developers, DevOps, QA engineers, and consultants. Supports Windows, macOS, and Linux (x64 and arm64).
https://crawler.siteone.io/
MIT License
255 stars 17 forks source link

-1:CON status #10

Closed Azmodeszer closed 3 months ago

Azmodeszer commented 4 months ago

What exactly is the significance of the response "-1:CON" from the crawl? I am trying to crawl a local NetBox instance, but no matter what I try, I immediately get this error and not much else useful log output (--debug-log-file="debug.log" doesn't save anything in the crawler directory, even though the user has all the needed permissions; --debug does not create any additional info in the shell that is not already there without it). I could not find anything in the official documentation about this either.

This seems to be a problem particular to the NetBox installation, as other sites on the same network are being crawled just fine (and it's not a 404, since that would actually result in a 404 in the output). Just trying to get an idea of what the problem even is here.

image

janreges commented 4 months ago

Hi @Azmodeszer,

this status code "-1:CON" means that the connection to the destination server could not be established.

What exactly does the URL look like that you are giving to the crawler? Are you specifying "http" or "https" in it?

Do you include an IP address or domain name? If a domain name, can it be resolved on your machine using DNS resolving, such as ping your.ocaldomain.com?

Azmodeszer commented 4 months ago

The line is --url=https://<domain name>/ (which is resolvable and reachable on the machine I'm attempting the crawl on). The name is actually an ALIAS record, but the crawl processes the resolving tree just fine and is pointed to the main record/IP, according to the log output.

I'm suspecting it might have something to do with the Apache configuration on the target machine, which is a bit convoluted and needs some cleanup ex. with its HTTPS redirects. But yeah, helps to at least know what the error response means.

janreges commented 3 months ago

Hi @Azmodeszer,

did you somehow solve the problem with -1:CON? In general, I am interested in the scenario with the local network and if there is a possibility to treat it on the side of the crawler as well.

Azmodeszer commented 3 months ago

I did not, no, it's not been a priority (for now we've implemented a temporary compromise solution with httrack). Depends on when I will get around to having a closer look at that system's webserver configuration. I'll keep you posted when/if something new comes up. Thanks.