Open zwolf opened 5 years ago
confirming that the VM is killing the ruby sidekiq process
[ 1699.183666] Out of memory: Kill process 2543 (ruby) score 787 or sacrifice child
[ 1699.197161] Killed process 2543 (ruby) total-vm:2313016kB, anon-rss:1609176kB, file-rss:0kB
I think the culprit is the map here instead of using a find_each and add to array / flush to file on each batch. https://github.com/zooniverse/Seven-Ten/blob/a3c0d67409f92594aef38d320a0d378d60ac0975/app/workers/data_export_worker.rb#L33
find_in_batches
would be my go to, along with writing each batch to the csv as they're created, then forgetting them.
Snapshot Wisconsin's latest experiment is over halfway over and has around 141k metrics. Creating a new data request for the split goes one of two ways: either you save it and it takes a bit to complete, or you lose control of the terminal as the instance slowly grinds to a halt and dies. If it's been a minute since your last export, it's usually the latter, then when the new instance comes up, the former.
Seven Ten Production is running on a t2.small ec2 instance.