Output question - multiple reports per /lang-code/

janreges / siteone-crawler

SiteOne Crawler is a cross-platform website crawler and analyzer for SEO, security, accessibility, and performance optimization—ideal for developers, DevOps, QA engineers, and consultants. Supports Windows, macOS, and Linux (x64 and arm64).

MIT License

255 stars 17 forks source link

Hi,

I'm glad my crawler is helping you :)

The crawler currently crawls the entire website on a given domain, or even on other domains based on --allowed-domain* options.

You can allow or deny crawling of URLs using --include--regex or --exclude--regex.

If you want to generate a bunch of reports for subpages starting with the language code, I believe this bash script will do exactly what you want.

Btw, I just deployed a new and very nice version of the HTML report. I hope you will be excited ;)

#!/bin/bash

COUNTRIES=("en-US" "en-UK" "cs-CZ")

for COUNTRY in "${COUNTRIES[@]}"
do
    COUNTRY_ESCAPED=${COUNTRY//-/\\-}

    ./swoole-cli crawler.php \
        --url='https://your.domain/'"$COUNTRY" \
        --include-regex='/^\/'"$COUNTRY_ESCAPED"'/' \
        --output-html-file='tmp/report.'"$COUNTRY.html"

done

janreges / siteone-crawler

Output question - multiple reports per /lang-code/ #2