Meteor job collection long running jobs causing server crash and restart Exited from signal: SIGBUS (Meteor v1.4.4.3)

mexin commented 6 years ago

Hello, I notice that on some long running jobs using Meteor Job Collection and deployed on Galaxy our app was crashing and restarting constantly (no error log from Galaxy) also when testing locally for extended periods of time (24hrs job running) we notice Meteor crashing with a Exited from signal: SIGBUS if anyone has experienced this type of issue and have any insights I would really appreciate it!

This long running jobs are basically HTTP requests being done on a third party service, we capture the data, clean it and insert/update into Mongo.

Some examples (cannot share the actual code due to my IP agreement)

Creating a call to the job

const job = new Job(myJobs, 'longRunningJob', { kit: kitUser });
    job.depends(previousJob);
    job.priority('normal')
        .retry({ retries: myJobs.forever, wait: 15 * 1000, backoff: 'exponential' })
        .save();

Job

const jobQueue = myJobs.processJobs('longRunningJob', {workTimeout: 30000}, Meteor.bindEnviroment(function (job, cb) {
     const kit = job.data.kit;

     longRunningFunc(kit, function (err) {
         if (err) {
             job.log(`Calling longRunningFunc failed with error: ${err}`,
                 {level: 'warning'});
             console.log(err);
             job.fail('' + err);
         } else {
             job.done();
         }
         cb();
     });
}));

longRunningFunc gets the data, cleans it and insert it into mongo

Should we update to 1.5.2?? we had some issues with 1.5.1 so we had to go back to 1.4.4.3 which was our stable version. Could it be a memory leak due to the callbacks?

Thanks! Appreciate the help!

vsivsi commented 6 years ago

Could be some kind of a memory leak. I've not seen any reports of memory leaks from JobCollection, and callbacks don't inherently leak memory.

One comment though is I wonder: why you are even bothering to use JobCollection for a single job/worker that is basically always running? What benefit is there using job collection for this, as opposed to a much simpler tool like async.forever() or similar? http://caolan.github.io/async/docs.html#forever

a4xrbj1 commented 6 years ago

Can be closed, we identified the problem. GC wasn't working properly. Thanks!

guilhermedecampo commented 6 years ago

@a4xrbj1 can you please indicate/explain how did you fix it? I have the same problem but in a totally different situation but this explanation could be super useful.

Best!

mexin commented 6 years ago

@guilhermedecampo it was our fault on a piece of code where one of our variables was kept on memory due a closure so GC wasn't picking it up, hence the growth over time as this object contained a lot of data... You could try adding memwatch-ng and get some heap snapshots so you can trace where your leak its coming from.

Here's some posts that helped us, hope you find them useful: https://www.dynatrace.com/blog/understanding-garbage-collection-and-hunting-memory-leaks-in-node-js/ https://gist.github.com/lloyd/3932358 (check memwatch-ng is the most recent updated fork)

Greetings!

guilhermedecampo commented 6 years ago

So awesome.

Thank you @mexin 🔝

vsivsi / meteor-job-collection

Meteor job collection long running jobs causing server crash and restart Exited from signal: SIGBUS (Meteor v1.4.4.3) #257