Closed fernstedt closed 8 months ago
Hi,
I'm glad my crawler is helping you :)
The crawler currently crawls the entire website on a given domain, or even on other domains based on --allowed-domain*
options.
You can allow or deny crawling of URLs using --include--regex
or --exclude--regex
.
If you want to generate a bunch of reports for subpages starting with the language code, I believe this bash script will do exactly what you want.
Btw, I just deployed a new and very nice version of the HTML report. I hope you will be excited ;)
#!/bin/bash
COUNTRIES=("en-US" "en-UK" "cs-CZ")
for COUNTRY in "${COUNTRIES[@]}"
do
COUNTRY_ESCAPED=${COUNTRY//-/\\-}
./swoole-cli crawler.php \
--url='https://your.domain/'"$COUNTRY" \
--include-regex='/^\/'"$COUNTRY_ESCAPED"'/' \
--output-html-file='tmp/report.'"$COUNTRY.html"
done
Hello and thank you for a great tool.
I am doing a crawl on a website that have versions for all countries (130 of them) www.URL.com/en-uk/ as an example then almost the same pages with some local content.
I am trying to figure out a way besides doing bash for the output to be some of the url
output=$country/result.html
I could not do this from the tool (from what I can see) so I seek gudience. Otherwise I need to do 130 crawls, instead of the tool can save me diffrent countrys in diffrent folders.
I can do a bash that uses a loops a file and and where to replace words in the code.
But if this tool could manage to have variable output it would be great