webis-de / scriptor

Plug-and-play reproducible web analysis.
MIT License
5 stars 2 forks source link

Suggested improvements to snapshot script #27

Open johanneskiesel opened 2 years ago

johanneskiesel commented 2 years ago
 for (const [i, u] of url.entries()) {
        const page = await browserContext.newPage({
            userAgent: 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
        });
        promises.push(page.goto(u, { waitUntil: "domcontentloaded" }).then(async (resp) => {
            try {
                await page.waitForLoadState('load', { timeout: 10000 });
            } catch (ex) {}

            // Adapt viewport height to scroll height
            await page.waitForTimeout(500);
            await pages.adjustViewportToPage(page, optionsViewportAdjust);
            await page.waitForTimeout(1000);
            try {
                await page.waitForLoadState('networkidle', { timeout: 15000 });
            } catch (ex) {}

            // Take snapshot
            const snapName = url.length > 1 ? `snapshot-${i}` : "snapshot";
            await pages.takeSnapshot(page, Object.assign(
                {path: path.join(outputDirectory, snapName)}, scriptOptions[SCRIPT_OPTIONS_SNAPSHOT]
            ));
        }));
    }

Nice idea to use domcontentloaded and then wait for load with timeout.