Website crashing due to memory leaks.

JrtPec commented 8 years ago

Yesterday we succeeded in getting CSV's to generate from TMPO live on the website and send them to the browser. We however noticed that each request uses some memory and fails to free it afterwards. After a few requests the server unavoidably crashes.

We have tried following things to reduce memory load and free it up after the request, however none have really worked.

Write a wrapper to close the file buffer after the request has completed. (link)
Use a temporary file to store the CSV and serve it instead of using StringIO or cStringIO.
Setting the Flask flag app.use_x_sendfile = True, to have nginx serve the file directly instead of the app. (I did not thoroughly test this, not sure of its effect)
Deleting the Pandas DataFrame after the CSV is written, using del df
Calling the garbage collector after the delete: import gc; gc.collect() (link)

Does anybody have other ideas we could try? The download page is live, but hidden under opengrid.be/download. The status quo is that it does work, however after a few runs it will crash the server, which then immediately restarts.

icarus75 commented 8 years ago

Tmpo blocks consist of gzipped json. So why not just put the tmpo blocks directly on the wire and offload the CSV conversion work to the browser? With proper HTTP encoding set, the browser will take care of inflating the gzip.

JrtPec commented 8 years ago

We could do it that way, but you could only download raw data that way, right? So people would have to convert epoch timestamps, interpolate data, resample it... while the exact purpose of the csv-download page was to enable non-programmers to import data into excel or something and experiment on their own. I don't know if raw data would be very useful for these people...

JrtPec commented 8 years ago

I'm going to try and write a generator that creates small dataframes and streams them, like this

saroele commented 7 years ago

@JrtPec we discussed this last meeting. What is the status now that our droplet has more memory and swap?

JrtPec commented 7 years ago

It seems to be much better, but I can still crash the site when selecting a large time period. We could put a cap on the time period, or figure out some clever way to call tmpo in chunks and stream the csv in blocks.

opengridcc / website

Website crashing due to memory leaks. #91