webrecorder / browsertrix-crawler

Run a high-fidelity browser-based crawler in a single Docker container
https://crawler.docs.browsertrix.com
GNU Affero General Public License v3.0
589 stars 73 forks source link

Video on kiwix.org homepage is not retrieved #439

Open benoit74 opened 8 months ago

benoit74 commented 8 months ago

Basically, when running the crawler with official 0.12.2 Docker image on https://kiwix.org/fr/, the Youtube video on the home page is not in the WARCs:

docker run --rm -it -v ${PWD}/output:/output webrecorder/browsertrix-crawler:0.12.2  crawl --depth 0 --url https://kiwix.org/fr/ --cwd /output/.tmph919m5n3

For more details, see https://github.com/openzim/zimit/issues/247

benoit74 commented 8 months ago

Please note that some investigations have been done in https://github.com/openzim/zimit/issues/247, worth having a look there before digging into this issue.