Issues on Windows - Githubissues

krtek4 commented 8 years ago

Hey there,

I have multiple issues on windows

First a non Windows specific issue : git is required has some packages are linked to github repos.

Then, a11ym behave strangely on windows. When I try to crawl an URL, I have one of three possible outcomes :

a11ym fetches the first URL and stop immediately afterward. No report is produced.
a11ym fetches the first URL, run it and then fetches a bunch of URL and stops. No report is produced.
a11ym fetches a bunch of URLs, runs the X firsts (X being the number of workers) and then fetches other URLs until stopped manually. No report is produced.

In each cases, I have no debug output whatsoever.

I tried multiple combination of global and non global packages :

Global PhantomJS and A11ym
Global PhantomJS and local A11ym
Local PhantomJS and A11ym
Local PhantomJS and local A11ym

The result is the same in each cases. The computer is freshly installed with Windows 10 and the latest nodejs in 64 bit.

> node.exe --version
v5.10.1
> npm --version
3.8.3
> [System.Environment]::OSVersion.Version

Major  Minor  Build  Revision
-----  -----  -----  --------
10     0      10586  0
> phantomjs.cmd -v
2.1.1

callmemagnus commented 8 years ago

I'm trying to run a11ym on linux.

I encounter the same issue as described in this ticket: "a11ym fetches the first URL and stop immediately afterward. No report is produced."

I only get

$ ./node_modules/.bin/a11ym https://todomvc.com                                    
Initializing with https://todomvc.com.
$

And that's all.

Versions:

$ node -v
v4.2.2
$ npm -v
2.14.7
$ npm list
/src/test
└─┬ the-a11y-machine@0.8.1
  ├── async@1.5.2
  ├─┬ chalk@1.1.3
  │ ├── ansi-styles@2.2.1
  │ ├── escape-string-regexp@1.0.5
  │ ├─┬ has-ansi@2.0.0
  │ │ └── ansi-regex@2.0.0
  │ ├─┬ strip-ansi@3.0.1
  │ │ └── ansi-regex@2.0.0
  │ └── supports-color@2.0.0
  ├─┬ commander@2.9.0
  │ └── graceful-readlink@1.0.1
  ├── crypto@0.0.3
  ├─┬ glob@6.0.4
  │ ├─┬ inflight@1.0.4
  │ │ └── wrappy@1.0.1
  │ ├── inherits@2.0.1
  │ ├─┬ minimatch@3.0.0
  │ │ └─┬ brace-expansion@1.1.4
  │ │   ├── balanced-match@0.4.1
  │ │   └── concat-map@0.0.1
  │ ├─┬ once@1.3.3
  │ │ └── wrappy@1.0.1
  │ └── path-is-absolute@1.0.0
  ├── HTML_CodeSniffer@2.0.1 (git+https://github.com/liip-forks/HTML_CodeSniffer.git#5cee16fe68f76ffd96caee41c6b2754fc00d4f47)
  ├─┬ mkdirp@0.5.1
  │ └── minimist@0.0.8
  ├─┬ pa11y@3.2.1 (git+https://github.com/liip-forks/pa11y.git#a4ab830d30bbee4064d1794a32e457d85be90f24)
  │ ├── async@1.4.2
  │ ├─┬ bfj@1.2.2
  │ │ └── check-types@3.2.0
  │ ├─┬ commander@2.8.1
  │ │ └── graceful-readlink@1.0.1
  │ ├── lower-case@1.1.3
  │ ├─┬ node.extend@1.1.5
  │ │ └── is@3.1.0
  │ ├─┬ once@1.3.3
  │ │ └── wrappy@1.0.1
  │ └─┬ truffler@2.1.1
  │   ├── freeport@1.0.5
  │   ├─┬ hasbin@1.1.3
  │   │ └── async@1.5.2
  │   └── node-phantom-simple@2.0.6
  ├── process@0.11.3
  ├─┬ simplecrawler@0.7.0 (git+https://github.com/cgiffard/node-simplecrawler.git#bdafeb7acb55cb38655ce44d522ce06873db621e)
  │ ├── iconv-lite@0.4.13
  │ └── urijs@1.18.0
  └── underscore@1.8.3

Hywan commented 8 years ago

@callmemagnus Sorry for the late reply… There a TLS issue with https://todomvc.com. Maybe this is why the crawler does not scan it. Did you try with the --http-tls-disable option?

Hywan commented 7 years ago

@callmemagnus There is a certificate issue with https://todomvc.com. See the following command:

$ curl -D - -o /dev/null https://todomvc.com -s
curl: (60) SSL certificate problem: Invalid certificate chain
More details here: https://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

With the --insecure flag, we are able to GET the page:

$ curl -D - -o /dev/null --insecure https://todomvc.com -s
HTTP/1.1 200 OK
Server: GitHub.com
Date: Wed, 21 Dec 2016 15:43:43 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 254821
Last-Modified: Tue, 15 Nov 2016 19:32:33 GMT
Access-Control-Allow-Origin: *
Expires: Wed, 21 Dec 2016 15:53:43 GMT
Cache-Control: max-age=600
Accept-Ranges: bytes
X-GitHub-Request-Id: B2D3F57C:C44D:3D11CD8:585AA32F

So basically, you have to use a11ym with --http-tls-disable:

$ ./a11ym -m 3 --http-tls-disable https://todomvc.com
Initializing with https://todomvc.com.
Fetch complete for https://todomvc.com/.
Waiting to run https://todomvc.com/.
 1/3  Run: https://todomvc.com/.
Fetching https://todomvc.com/site-assets/favicon.ico.
Fetching https://todomvc.com/bower_components/webcomponentsjs/webcomponents-lite.min.js.
Fetching https://todomvc.com/examples/backbone.
Fetching https://todomvc.com/url,baseUri.
Fetching https://todomvc.com/assetPath,module.ownerDocument.baseURI.
Fetching https://todomvc.com/url,root.
Fetching http://todomvc.com/.
Fetch complete for https://todomvc.com/site-assets/favicon.ico; skipped, not text/html.
Fetching https://todomvc.com/site-assets/main.min.css.
Fetch complete for https://todomvc.com/bower_components/webcomponentsjs/webcomponents-lite.min.js; skipped, not text/html.
Fetching https://todomvc.com/bower_components/paper-icon-button/%5B%5Bsrc%5D%5D.
Fetch complete for https://todomvc.com/examples/backbone, and redirect to http://todomvc.com/examples/backbone/.
Fetching https://todomvc.com/examples/angularjs.
https://todomvc.com/url,baseUri responds with a 404.
https://todomvc.com/assetPath,module.ownerDocument.baseURI responds with a 404.
https://todomvc.com/url,root responds with a 404.
https://todomvc.com/bower_components/paper-icon-button/%5B%5Bsrc%5D%5D responds with a 404.
Fetching https://todomvc.com/bower_components/webcomponentsjs/b%22,%22http:/a.
Fetch complete for http://todomvc.com/.
Waiting to run http://todomvc.com/.
 2/3  Run: http://todomvc.com/.
Fetching http://todomvc.com/url,baseUri.
Fetching http://todomvc.com/assetPath,module.ownerDocument.baseURI.
Fetching http://todomvc.com/url,root.
etc.

I have this report:

screen shot 2016-12-21 at 16 47 21

Hywan commented 7 years ago

@krtek4 No more direct dependencies are using Git. Onlt HTML_CodeSniffer is using https://github.com/….

About the unpexected stop, probablt that #78 is a similar issue, and it has been fixed. Could you confirm please? If the problem is still present, please, give me the URL you are trying to crawl. I am not sure this is an issue related to Windows.

krtek4 commented 7 years ago

I don't have a Windows machine available anymore, and honestly I don't remember the website I had issues with.

Since there has been multiple changes since I opened the bug and no other reports about the same issue, I think it is fair to say the bug is fixed. For me, you can close it :)

Thanks !

Hywan commented 7 years ago

Thank you! Feel free to reopen if needed.

xoxo

syndy1989 commented 7 years ago

Hi there, I'm actually using Windows server 2012. I tried downloading cygwin on Windows to run bash commands. I've noticed that pa11y-crawl gives the following error when attempting to crawl a URL with a subdomain.

. is not an html document, skipping

Any advice on this would be helpful. Thanks in advance or is there any other tool to crawl the site on Windows machine

Hywan commented 7 years ago

@syndy1989 Can you give me the command line you run please?

syndy1989 commented 7 years ago

@Hywan Please find the command line error below

$ pa11y-crawl nature.com fatal: Not a git repository (or any of the parent directories): .git

using wget to mirror site <<< found 1 files in 1 directories beginning the analysis --------------------------------------- C:\Users\AppData\Roaming\npm/node_modules/pa11y-crawl/pa11y-crawl.sh: line 58: python: command not found is not an html document, skipping C:\Users\AppData\Roaming\npm/node_modules/pa11y-crawl/pa11y-crawl.sh: line 58: python: command not found is not an html document, skipping C:\Users\AppData\Roaming\npm/node_modules/pa11y-crawl/pa11y-crawl.sh: line 58: python: command not found

-> analyzing

jq: error: Could not open file /home/results.json: No such file or directory jq: error: Could not open file /home/pa11y-crawl/pa11y.json: No such file or directory jq: error (at /home/pa11y-crawl/pa11y.json:0): Cannot use null (null) as object key parse error: Invalid numeric literal at line 2, column 8 parse error: Invalid numeric literal at line 2, column 8 parse error: Invalid numeric literal at line 2, column 8 <<< pa11y says: error: | warning: | notice: cleaning up rm: cannot remove '/home/pa11y-crawl': Device or resource busy

using wget to mirror site <<< found 1 files in 1 directories beginning the analysis	--------------------------------------- C:\Users\AppData\Roaming\npm/node_modules/pa11y-crawl/pa11y-crawl.sh: line 58: python: command not found		is not an html document, skipping C:\Users\AppData\Roaming\npm/node_modules/pa11y-crawl/pa11y-crawl.sh: line 58: python: command not found		is not an html document, skipping C:\Users\AppData\Roaming\npm/node_modules/pa11y-crawl/pa11y-crawl.sh: line 58: python: command not found
-> analyzing

Hywan commented 7 years ago

@syndy1989 You are running pa11y-crawl, not a11ym.

liip / TheA11yMachine

Issues on Windows #63