Open rgaudin opened 1 year ago
We must still keep a fallback to parsing the seed url ourselves, since we cannot expect pages.jsonl to be always available (warc2zim must work from only a warc file, pages.jsonl is only available when warc2zim is using in conjunction with browsertrix crawler e.g. in zimit scraper)
Since crawler 0.11.0 (https://github.com/webrecorder/browsertrix-crawler/pull/362), the captured favicon is available in pages.jsonl We could use that when a custom favicon is not provided instead of parsing the seed url ourselves.