Closed nickoneill closed 5 years ago
Is there any guidance on how to debug this part of the process so I can figure out what's happening?
put console log somewhere here https://github.com/stereobooster/react-snap/blob/master/src/puppeteer_utils.js#L108-L119
const getLinks = async opt => {
const { page } = opt;
const anchors = await page.evaluate(() =>
Array.from(document.querySelectorAll("a")).map(anchor => {
if (anchor.href.baseVal) {
const a = document.createElement("a");
a.href = anchor.href.baseVal;
return a.href;
}
return anchor.href;
})
);
Thanks for the help looking in that direction.
Turns out webpack was setting PUBLIC_URL to the production domain, and new deploys were looking on that domain for a JS file that looked like main.1234abcd.js, using a hash of the js file for cache busting. This didn't exist on the production domain before it was deployed so loading the page failed and no links were detected.
Setting the JS links to root-relative URL (i.e. /static/js/main.1234abcd.js) loaded the JS correctly from the snap-created server and allowed it to be crawled correctly.
I'm running into intermittent issues where react-snap only crawls a single page and I can't figure out how I can better understand what is going on. Is there any guidance on how to debug this part of the process so I can figure out what's happening?
I appreciate the focus on stack overflow for better visibility into answered questions, so I've posted a detailed account there: https://stackoverflow.com/questions/54961242/react-snap-sometimes-only-crawls-a-single-page