Crawl only html documents

cappslock commented 5 years ago

Feature Request

Is your feature request related to a problem? Please describe. Puppeteer does not handle PDF files. react-snap will crawl to them if links exist and the process will crash. There doesn't seem to be a way to ignore this.

Describe the solution you'd like I'd like the ability to be able to specify files/paths/globs to ignore as an option.

Describe alternatives you've considered I've looked at other snapshot libraries but prefer this one. I've looked at whether it's possible to pass an argument to puppeteer to handle this but it doesn't seem to be.

Teachability, Documentation, Adoption, Migration Strategy I'd imagine an option in package.json, like:

ignore: ["**.pdf"]

cappslock commented 5 years ago

I would be happy to submit a pull request as well.

stereobooster commented 5 years ago

react-snap never meant to crawl anything except html, so I would say we can ignore all non-html files by default without need to configure, but I would not rely on file name for this task instead I would try to use content-type http header

cappslock commented 5 years ago

Thanks for the reply. That makes sense. Would you welcome a pull request for that? If so, would you be able to point me to the relevant area of code?

stereobooster / react-snap

Crawl only html documents #339

Feature Request