vbauer / manet

Website screenshot service powered by Node.js, SlimerJS and PhantomJS
MIT License
576 stars 102 forks source link

Error capturing some urls #88

Open andreasmcdermott opened 7 years ago

andreasmcdermott commented 7 years ago

Trying to grab a screenshot from this url: https://www.nextgenscience.org/resources/equip-professional-learning-facilitator%E2%80%99s-guide-v20 (unescaped: https://www.nextgenscience.org/resources/equip-professional-learning-facilitator’s-guide-v20)

And it results in the following error:

2017-08-02T21:41:37.851Z - debug: Request query parameters: {"url":"https://www.nextgenscience.org/resources/equip-professional-learning-facilitator%E2%80%99s-guide-v20"}
2017-08-02T21:41:37.851Z - debug: Request body parameters: {}
2017-08-02T21:41:37.861Z - debug: Sending file ("https://www.nextgenscience.org/resources/equip-professional-learning-facilitator’s-guide-v20") in response
2017-08-02T21:41:37.864Z - info: Capture site screenshot: "https://www.nextgenscience.org/resources/equip-professional-learning-facilitator’s-guide-v20"
2017-08-02T21:41:37.864Z - debug: Options for script: {"url":"https://www.nextgenscience.org/resources/equip-professional-learning-facilitator’s-guide-v20"}, base64: eyJ1cmwiOiJodHRwczovL3d3dy5uZXh0Z2Vuc2NpZW5jZS5vcmcvcmVzb3VyY2VzL2VxdWlwLXByb2Zlc3Npb25hbC1sZWFybmluZy1mYWNpbGl0YXRvchlzLWd1aWRlLXYyMCJ9, command: ["phantomjs","--ignore-ssl-errors=true","--web-security=false","/usr/local/lib/node_modules/manet/src/scripts/screenshot.js","eyJ1cmwiOiJodHRwczovL3d3dy5uZXh0Z2Vuc2NpZW5jZS5vcmcvcmVzb3VyY2VzL2VxdWlwLXByb2Zlc3Npb25hbC1sZWFybmluZy1mYWNpbGl0YXRvchlzLWd1aWRlLXYyMCJ9","/var/folders/2m/3h5k0q2j40s2gk2tljb373rr_hf0p8/T/385fe470089a2bc87bfd2a1eb43caf15499d31e8.png"]
2017-08-02T21:41:41.977Z - debug: Process output: Script options: {"url":"https://www.nextgenscience.org/resources/equip-professional-learning-facilitators-guide-v20"}
Error: SyntaxError: JSON Parse error: Unterminated string
Error: TypeError: undefined is not an object (evaluating 'options.clipRect')
2017-08-02T21:41:41.977Z - debug: Execution time: 4.11 sec
2017-08-02T21:41:41.977Z - debug: Process finished work: eyJ1cmwiOiJodHRwczovL3d3dy5uZXh0Z2Vuc2NpZW5jZS5vcmcvcmVzb3VyY2VzL2VxdWlwLXByb2Zlc3Npb25hbC1sZWFybmluZy1mYWNpbGl0YXRvchlzLWd1aWRlLXYyMCJ9
2017-08-02T21:41:41.978Z - error: Error while sending data file: ENOENT: no such file or directory, stat '/var/folders/2m/3h5k0q2j40s2gk2tljb373rr_hf0p8/T/385fe470089a2bc87bfd2a1eb43caf15499d31e8.png'

Running version: 0.4.19.

edit

This url fails as well: https://www.theguardian.com/world/2016/oct/20/spanish-court-overturns-catalonia-bullfighting-ban

Which indicates that it might not be the "’" that is the problem.

edit 2

So the guardian url shows an error in page ({"error":{"error":"Can not capture: https://www.theguardian.com/world/2016/oct/20/spanish-court-overturns-catalonia-bullfighting-ban"}}), but after another few seconds it seems to succeed, and if I run the same url again, it loads the image from storage.

2017-08-02T22:17:49.993Z - debug: Request query parameters: {"url":"https://www.theguardian.com/world/2016/oct/20/spanish-court-overturns-catalonia-bullfighting-ban"}
2017-08-02T22:17:49.993Z - debug: Request body parameters: {}
2017-08-02T22:17:50.009Z - debug: Sending file ("https://www.theguardian.com/world/2016/oct/20/spanish-court-overturns-catalonia-bullfighting-ban") in response
2017-08-02T22:17:50.011Z - info: Capture site screenshot: "https://www.theguardian.com/world/2016/oct/20/spanish-court-overturns-catalonia-bullfighting-ban"
2017-08-02T22:17:50.012Z - debug: Options for script: {"url":"https://www.theguardian.com/world/2016/oct/20/spanish-court-overturns-catalonia-bullfighting-ban"}, base64: eyJ1cmwiOiJodHRwczovL3d3dy50aGVndWFyZGlhbi5jb20vd29ybGQvMjAxNi9vY3QvMjAvc3BhbmlzaC1jb3VydC1vdmVydHVybnMtY2F0YWxvbmlhLWJ1bGxmaWdodGluZy1iYW4ifQ==, command: ["phantomjs","--ignore-ssl-errors=true","--web-security=false","/usr/local/lib/node_modules/manet/src/scripts/screenshot.js","eyJ1cmwiOiJodHRwczovL3d3dy50aGVndWFyZGlhbi5jb20vd29ybGQvMjAxNi9vY3QvMjAvc3BhbmlzaC1jb3VydC1vdmVydHVybnMtY2F0YWxvbmlhLWJ1bGxmaWdodGluZy1iYW4ifQ==","/var/folders/2m/3h5k0q2j40s2gk2tljb373rr_hf0p8/T/c721ae458be487ced560f9ef619d45245b425560.png"]
2017-08-02T22:17:50.811Z - debug: Process output: Script options: {"url":"https://www.theguardian.com/world/2016/oct/20/spanish-court-overturns-catalonia-bullfighting-ban"}
Resource was downloaded: data:image/png;base64,iVBORw0KGgoAAAA[...]
Resource was downloaded: data:application/x-font-woff;base64,d09GRgABAAA[...]
Resource was downloaded: https://www.theguardian.com/world/2016/oct/20/spanish-court-overturns-catalonia-bullfighting-ban
Resource was downloaded: data:application/x-font-woff;base64,d09GRg[...]
Resource was downloaded: data:application/x-font-woff;base64,d09G[...]
Resource was downloaded: data:application/x-font-woff;base64,d09G[...]
Resource was downloaded: data:application/x-font-woff;base64,d09GR[...]
2017-08-02T22:17:50.831Z - error: Process error: 
2017-08-02T22:17:50.831Z - debug: Execution time: 0.8 sec
2017-08-02T22:17:50.831Z - debug: Process finished work: eyJ1cmwiOiJodHRwczovL3d3dy50aGVndWFyZGlhbi5jb20vd29ybGQvMjAxNi9vY3QvMjAvc3BhbmlzaC1jb3VydC1vdmVydHVybnMtY2F0YWxvbmlhLWJ1bGxmaWdodGluZy1iYW4ifQ==
2017-08-02T22:17:50.832Z - error:  error=Can not capture: https://www.theguardian.com/world/2016/oct/20/spanish-court-overturns-catalonia-bullfighting-ban

When running the same url again (I get the screenshot). Logs:

2017-08-02T22:20:33.081Z - debug: Request query parameters: {"url":"https://www.theguardian.com/world/2016/oct/20/spanish-court-overturns-catalonia-bullfighting-ban"}
2017-08-02T22:20:33.081Z - debug: Request body parameters: {}
2017-08-02T22:20:33.085Z - debug: Sending file ("https://www.theguardian.com/world/2016/oct/20/spanish-court-overturns-catalonia-bullfighting-ban") in response
2017-08-02T22:20:33.085Z - info: Capture site screenshot: "https://www.theguardian.com/world/2016/oct/20/spanish-court-overturns-catalonia-bullfighting-ban"
2017-08-02T22:20:33.086Z - debug: Take screenshot from file storage: eyJ1cmwiOiJodHRwczovL3d3dy50aGVndWFyZGlhbi5jb20vd29ybGQvMjAxNi9vY3QvMjAvc3BhbmlzaC1jb3VydC1vdmVydHVybnMtY2F0YWxvbmlhLWJ1bGxmaWdodGluZy1iYW4ifQ==

Seems like this is actually two different problems.