zooniverse / seven-ten

Split testing service
0 stars 1 forks source link

Creating a DataRequest kills the instance #45

Open zwolf opened 5 years ago

zwolf commented 5 years ago

Snapshot Wisconsin's latest experiment is over halfway over and has around 141k metrics. Creating a new data request for the split goes one of two ways: either you save it and it takes a bit to complete, or you lose control of the terminal as the instance slowly grinds to a halt and dies. If it's been a minute since your last export, it's usually the latter, then when the new instance comes up, the former.

Seven Ten Production is running on a t2.small ec2 instance.

camallen commented 5 years ago

confirming that the VM is killing the ruby sidekiq process

[ 1699.183666] Out of memory: Kill process 2543 (ruby) score 787 or sacrifice child
[ 1699.197161] Killed process 2543 (ruby) total-vm:2313016kB, anon-rss:1609176kB, file-rss:0kB

I think the culprit is the map here instead of using a find_each and add to array / flush to file on each batch. https://github.com/zooniverse/Seven-Ten/blob/a3c0d67409f92594aef38d320a0d378d60ac0975/app/workers/data_export_worker.rb#L33

zwolf commented 5 years ago

find_in_batches would be my go to, along with writing each batch to the csv as they're created, then forgetting them.