natahouse / react-snap

nata.house's fork of @stereobooster's react-snap:👻 Zero-configuration framework-agnostic static prerendering for SPAs
MIT License
31 stars 6 forks source link

Parse error skip option required to be included #39

Open Dipu711 opened 2 years ago

Dipu711 commented 2 years ago

Hi @philipeatela,

Thank you for this fork. We have react web app having around 2000 pages crawled using reactsnap daily. Sometime we are having issue of parse html error like div is properly not close or id having set without quote. Few of content are came from backend api and it is pure html where we receive parse error in reactsnap. Right now our webapp has only 2000 pages but in nearest future it is 10000 pages.

1) Is there any work around to trun off parse html error? 2) Is there any feature in reactsnap while crawling single page if having error we will skip that page from being crawl or generating html and continue further with next page crawling without halt/stop the execution of reactsnap. If there is any way to achieve that I think reactsnap utilization will increase may be.

Why I am initiating this thread because if in any of pages due to backend api content issue our whole reactsnap got terminated instead of that that it should be continue execution by skipping error pages. So it will save time for us and sanp got completed always.

Thank you.

philipeatela commented 2 years ago

Hi @Dipu711 I apologize for taking a long time to get back to you on this. I'm not having the time to be active on this repo, and I lack some of the knowledge of the creator/original maintainer, but I'll try to help out wherever I can this month.

Regarding your issue, I don't think currently there's a way to do just skip pages where there's a parse html error, I believe you would need to know in advance which pages will present problems and then exclude them using the "Exclude Routes" option. That is what I have done in the past, for the most complicated pages that had a chance to throw random errors, we'd just exclude them from being pre-rendered.

Maybe try to figure out a way to validate the generated html before feeding them into react-snap? Or maybe we can look into implementing it in the current codebase for this project, might not be so hard.

Dipu711 commented 2 years ago

Hi @philipeatela I know there is option for the exclude pages but there are thousands of pages where we are unable to identify which pages have error. So If you know any strategies before feeding theme to react-snap we will exclude that please share that and exclude having lots of urls so it is not right approach to implement in our case. Yes please try to help for skipping the error pages feature implementation in current codebase for this request.

I also raised one PR https://github.com/natahouse/react-snap/pull/40 for the same Please review it.

Thanks you for your support.