watson-developer-cloud / node-sdk

:comet: Node.js library to access IBM Watson services.
https://www.npmjs.com/package/ibm-watson
Apache License 2.0
1.48k stars 669 forks source link

[visual-recogniton] Intermittent when sending concurrent requests #714

Closed holly-cummins closed 5 years ago

holly-cummins commented 6 years ago

We've switched from sending zipped files to the VR service to sending individual images, to work around a performance issue on the server side. This generally works well, but we're seeing a high rate of errors if we send multiple requests concurrently. Around in in 20 images will come back with a code 400 error. On retry the image succeeds as we'd expect.

Error:

{ Error: 
  {"images": [
    {
      "image":"img.jpg",
      "error": {
        "code":400,
        "description":"No valid images uploaded to process"
      }
    }
  ],
  "images_processed":1
}
ammardodin commented 6 years ago

@holly-cummins Thanks for opening the issue. Generally, a 400 status code means that server received a malformed request which it couldn't apprehend. My suspicion is that there's a race condition somewhere that's causing some of the requests to become corrupted by the concurrent mechanism that's being employed.

Can you possibly fill us in on some details, such as:

Thanks.

holly-cummins commented 6 years ago

Thanks!

We're seeing the issue on both Mac OS and linux (and locally and from IBM Cloud). We're using v3.4.5 of the SDK, but saw similar issues with earlier versions.

We did have a linux-only race condition in our code on file access that our unit tests showed up, so we're now pretty careful to wait for the right stream events before doing anything.

     const params = {
      images_file: fs.createReadStream(fileToUpload),
      classifier_ids: [classifier_id],
      threshold: MINIMUM_CONFIDENCE
    };

    // Wait for the stream to be available
    return new Promise((resolve, reject) => {
      params.images_file.on('open', () => {
        this.classify(params)
          .then( // do stuff with the result ... )
          .then(() => resolve(result));

// ... 

  classify(params) {
    return promiseRetry(
      { retries: MAX_RETRY_TIMES_FOR_TANK, factor: 1.4, minTimeout: 200 },
      (retry, number) => {
        return this.classifyWithoutRetry(params).catch(err => {
          console.log(
            'Error returned from Watson. Retrying (attempt',
            number + '): ',
            err
          );

          retry(err);
          // TODO we seem to get this persistently, after the second 52 image. Setting the concurrency high makes it happen on the first image, but setting the concurrency low doesn't make it not happen.
        });
      }
    ).catch(err => {
      console.error(
        'Giving up with persistent error: ',
        err,
        '. Counts for this tank will not be accurate.'
      );
      return { images: [] };
    });
  }

  classifyWithoutRetry(params) {
    return new Promise((resolve, reject) => {
      const startTime = Date.now();
      this.watson.classify(params, (err, result) => {
        // console.log('Watson call completed in time ', Date.now() - startTime);
        if (err) {
          reject(err);
        } else {
          resolve(result);
        }
      });
    });
  }
germanattanasio commented 6 years ago

@holly-cummins I suspect this is a problem in the service. Can you try removing the cookie jar from base_service? You will have to extend that object or just modify the one in node_modules.

dpopp07 commented 6 years ago

@holly-cummins were you able to try this solution? Did it work for you?

cfjedimaster commented 6 years ago

I believe I'm having a similar issue. I had an app that called two visual recognition services, the main classify one and then the face detection one. It started acting up oddly - one call would work quickly and another would hang. If I ran the test again, the service that worked was completely random. You can see a simplified version here where I've removed use of Promises and such.

https://gist.github.com/cfjedimaster/6c6cbb39608b35623f9d4e83deedf2df

cfjedimaster commented 6 years ago

As a follow up - around 5 minutes later, I do get a result from my code:

 { Error: <HTML><HEAD><TITLE>Error</TITLE></HEAD><BODY>
An error occurred while processing your request.<p>
Reference&#32;&#35;27&#46;4cbbce17&#46;1541105610&#46;407f434d
</BODY></HTML>

    at Request._callback (C:\projects\recogtester\node_modules\watson-developer-cloud\lib\requestwrapper.js:113:21)
    at Request.self.callback (C:\projects\recogtester\node_modules\watson-developer-cloud\node_modules\request\request.js:185:22)
    at Request.emit (events.js:182:13)
    at Request.<anonymous> (C:\projects\recogtester\node_modules\watson-developer-cloud\node_modules\request\request.js:1157:10)
    at Request.emit (events.js:182:13)
    at IncomingMessage.<anonymous> (C:\projects\recogtester\node_modules\watson-developer-cloud\node_modules\request\request.js:1079:12)
    at Object.onceWrapper (events.js:273:13)
    at IncomingMessage.emit (events.js:187:15)
    at endReadableNT (_stream_readable.js:1092:12)
    at process._tickCallback (internal/process/next_tick.js:63:19) code: 0, 'x-global-transaction-id': undefined } null
cfjedimaster commented 6 years ago

Ok I'm sorry - it's an issue with the read streams. For some reason if I make a new params block and do a read stream on a second image, it works perfectly. So maybe something changed in Node since I built my initial demo.

germanattanasio commented 5 years ago

Now that we don't have cookies I think this issue can be marked as resolved.

dpopp07 commented 5 years ago

Closing in light of the above comment and due to inactivity. If this problem persists, we can reopen.

holly-cummins commented 5 years ago

I'm happy to be optimistic that this is fixed and close it, and I can re-open if I see the issue when I next stage the affected app.