Closed fanderegg closed 7 years ago
Hello,
The DOM is broken. However, the crawler uses regular expression, so I don't understand why it does not work yet. Searching. You might want to fix the DOM though ;-).
Finally, I found what is the problem.
Sometimes, we encounter URL like /en/log-in
, and sometimes /en/log-in/
. When having crawling /en/log-in
, the server responds with a 301 redirection to /en/log-in/
, and the crawler does not like it.
I am working on a fix.
Much better isn't it :-)?
(run with ./a11ym https://opentransportdata.swiss/en
).
The website https://opentransportdata.swiss/ is not recursively crawled. Only the links of the language switcher are recognized and crawled, but after that is stops. Command used:
a11ym https://opentransportdata.swiss/ -o out
The website consists of two different softwares (CKAN and WordPress), and it looks like the crawler does not like the wordpress pages (e.g. https://opentransportdata.swiss/de/cookbook/ or https://opentransportdata.swiss/de/) but correctly crawls ckan pages (e.g. https://opentransportdata.swiss/de/dataset/)