micahstubbs / screenshot-service

services to create a screenshot of a web page. optimized for screenshotting interactive data graphics.
0 stars 0 forks source link

improve scale performance #23

Open micahstubbs opened 5 years ago

micahstubbs commented 5 years ago

improve scale performance

better handle concurrency for

micahstubbs commented 5 years ago

as a consequence of writing to the filesystem, we need a way of resizing pngs from the original dimensions to thumbnail dimensions

this looks useful, https://malcoded.com/posts/nodejs-image-resize-express-sharp

also http://sharp.dimens.io/en/stable/api-input/

micahstubbs commented 5 years ago

https://unix.stackexchange.com/questions/225401/how-to-see-full-log-from-systemctl-status-service

to see the logs from our screenshot-bot systemd process and follow their progress

journalctl -u screenshot-bot -f 
micahstubbs commented 5 years ago

to find a string mdcscry in filenames in a directory

tree -f | grep mdcscry
micahstubbs commented 5 years ago

nginx autoindex is pretty nifty

https://www.keycdn.com/support/nginx-directory-index

location /somedirectory/ {
    autoindex on;
}

check out the result at https://screenshot.micah.fyi/screenshots/

micahstubbs commented 5 years ago

ok, so wrapping up the main tactics to improve performance that worked are:

micahstubbs commented 5 years ago

ok, after doing this scale exercise and bulk processing ~28k blocks, I'm resizing the instance from

16 vCPU, 104 GB memory

to

1 vCPU, 6 GB memory

micahstubbs commented 5 years ago

some conversation from the d3 slack

screen shot 2018-11-04 at 1 51 32 pm

text Curran Kelleher [4:53 AM] Very cool! Did you ever resolve that memory leak issue? Micah Stubbs [1:43 PM] good question - I found a workaround my suspicion is that (edited) ```await page.close() await browser.close()``` (edited) somehow isn't enough to fully close a pupetteer managed headless chrome instance. a proper solution would involve figured out how to close/end/kill those puppeteer + headless chrome processes that get spun up every time the `screenshot.js` script is runs. a clever solution might involve specifying the desired concurrency up front (in my case I experimented with concurrencies of 10, 12, 16, 8, 4, before landing on concurrency 4) and then starting puppeteer headless chrome processes for each desired level of js event loop concurrency (4 in my case) then re-using those same process so long as there were more pages left in the queue to screenshot (for what it's worth, my unsatisfying workaround was to use a server with 104gb of RAM, then manually restart the server when the memory was almost full. )

https://d3js.slack.com/archives/C0LBA67MG/p1541246020029600