node-js-libs / node.io

MIT License
1.8k stars 140 forks source link

Troubles with `max` option when retry is always called. #148

Closed zazabe closed 11 years ago

zazabe commented 11 years ago

Hi,

When I try to handle cases with a recurrent error response to my request, the node.io task hang.

After some debugging, I found a strange issue, request callback seems not be executed in the correct scope.

Consider this code:

Code

var nodeio = require('node.io');
var http = require('http');

var srv = http.createServer(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('body response for ' + req.url);
});
srv.listen('223344'); 

var tasks = [];
for(var i=0 ; i < 4 ; i++){
    tasks.push({
        id: 'task_' + i,
    });
}

var job = new nodeio.Job({ max: 2, retry: 3},{

    input: tasks,

    fail: function(input, status){
        console.error('failed', status);
        this.emit(input);
    },

    output: function(task){
        console.log('OUTPUT', task[0].id);    
    },

    run: function(task){
        var self = this;
        var id = Math.random().toString(36).substr(2, 9);
        this.id = id;

        console.log(task.id, 'start with job id:', id);

        this.postHtml('http://localhost:223344/' + task.id, { id: task.id}, function(err, $, data, headers){
            console.log(task.id, self.id, "RECEIVE", data);

            if(self.id != id){
                console.error(task.id, 'WRONG INSTANCE', 'job instance', self.id, 'is called with the result for job', id);
            }

            console.log(task.id, self.id, "RETRY");

            this.retry();
        });     
    }       
});

nodeio.start(job, function(){
    srv.close();
});

Output

zazabe@zazabe-laptop
$ node bug.js
task_0 start with job id: 8nylq7xfy
task_1 start with job id: ad9xvsei0
task_0 8nylq7xfy RECEIVE body response for /task_0
task_0 8nylq7xfy RETRY
task_1 ad9xvsei0 RECEIVE body response for /task_1
task_1 ad9xvsei0 RETRY
task_0 start with job id: 9blqwoxpr
task_1 start with job id: 8xw35t8d1
task_0 9blqwoxpr RECEIVE body response for /task_0
task_0 9blqwoxpr RETRY
task_1 8xw35t8d1 RECEIVE body response for /task_1
task_1 8xw35t8d1 RETRY
task_2 start with job id: idcno8s76
task_3 start with job id: kqhk674bu
task_2 idcno8s76 RECEIVE body response for /task_2
task_2 idcno8s76 RETRY
task_3 kqhk674bu RECEIVE body response for /task_3
task_3 kqhk674bu RETRY
task_0 start with job id: 920b36yab
task_1 start with job id: av5yz9cyy
task_0 920b36yab RECEIVE body response for /task_0
task_0 920b36yab RETRY
failed retry
OUTPUT task_0
task_1 av5yz9cyy RECEIVE body response for /task_1
task_1 av5yz9cyy RETRY
failed retry
OUTPUT task_1
task_2 start with job id: d8m6kvjtn
task_3 start with job id: ep4hhkh5p
task_2 ep4hhkh5p RECEIVE body response for /task_2
task_2 WRONG INSTANCE job instance ep4hhkh5p is called with the result of d8m6kvjtn
task_2 ep4hhkh5p RETRY
task_3 ep4hhkh5p RECEIVE body response for /task_3
task_3 ep4hhkh5p RETRY
task_3 start with job id: 9jd15efsi
task_3 9jd15efsi RECEIVE body response for /task_3
task_3 9jd15efsi RETRY
failed retry
OUTPUT task_3
^Czazabe@zazabe-laptop
$ 

The task_2 with job id d8m6kvjtn should not receive the response in job ep4hhkh5p, used for the task_3 normally...

So with the code above, nodeio will hang and the callback will receive the response on a wrong job. (the job id set as an attribute of the instance will be different than the scope defined id).

It will work perfectly with max option set to 1 but will hang if the number of job alowed at the same time is not greater than the number of input.

Is it because I'm doing something wrong or the code above should work normally ?

I start to investigate more deeply inside node.io sources, I think the problem come from the "request" library and/or the creation of child process to handle jobs...

Thanks

zazabe commented 11 years ago

Ok, I get it... it's about the instance_pool in process_worker.js.

When a job is retried, the completed job is pushed in a list of terminated job, to keep them for a later use.

When a job fail, if the fail function return a value or emit something, the job is also added to the instance_pool.

The bug appear when the last retry failed, the same job instance will be added 2 times into the pool.

So the same job can be reused at the same time, 2 times... and it produce the bug described above...

bug fix in the pull request