watson-developer-cloud / node-sdk

:comet: Node.js library to access IBM Watson services.
https://www.npmjs.com/package/ibm-watson
Apache License 2.0
1.48k stars 670 forks source link

Unable to upload zip files in Buffer format #984

Closed cpphen closed 4 years ago

cpphen commented 4 years ago

Hello,

I am using the Node-SDK for ibm-watson and am currently trying to Create Classifiers and also compare files with images being uploaded. In the docs here

Node VR

It states that acceptable formats are NodeJS readable stream or Buffer. When I upload a zip file of images from the front end, the logs in the server (express js server) show that we do indeed have a buffer, which looks something like this:

<Buffer ff d8 ff e0 00 10 4a 46 49 46 00 01 01 00 00 01 00 01 00 00 ff db 00 43 00 05 03 04 04 04 03 05 04 04 04 05 05 05 06 07 0c 08 07 07 07 07 0f 0b 0b 09 ... >

When I convert this to a string, I can see the image extensions .jpg/png after encoded strings like this

...\u0000\u0000�L\u0004\u0000Panasonic-ES8103S-Nanotech-Electric-Shaver-img3.jpgPK\u0001\u0002\u0014\u0000\u0014\u0000\u0000\u0000\b\u0000G�IOɉ.�\fR\u0001\u0000v�\u0001... \u0000\u0000\u0000\n�\u0004\u0000Philips Norelco Electric Shaver 8900, Wet & Dry Edition S8950-91.pngPK\u0001\u0002\u0014\u0000\u0014\u0000\u0000\u0000\b\u0000G�IO�\u0014\u000b��O\u0000\u0000L[\u0000\u0000\u0011\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000 \u0000\u0000\u0000x\u001d\u0006\u0000PR1342Prd1_HR.jpg

We are using express-fileupload which gives us a req.files object and that includes the Buffer.

I believe also, the Content-Type is being set as application/x-zip-compressed

dpopp07 commented 4 years ago

@henhen87 Thanks for the issue. What exactly is the error you are getting? Also, will you provide a code sample so that we can understand how you are passing the Buffers?

It would also be helpful to know what version of the SDK you are using

cpphen commented 4 years ago

Hello,

I am using "ibm-watson": "^4.5.3", and the Visual Recognition is V3.

As for errors, we do not get any errors, as it seems like the files are posting. When I send the buffer, I can see the models are being created in Watson Studio, but the status indicates failed. When I click on the test button, there is a message that states

Cannot execute learning task. : Could not train classifier. Verify there are at least 10 positive training images for each class, and at least 10 other unique training images (inluding optional negative_examples). There is a minimum of 1 positive class. Not enough samples for training, class: dalm has only 0 samples

Here is the code

const createClassifierParams = {
    name: 'Dogs',
    positive_examples: {
      dalm_positive_examples: req.files.positive_examples.data,
    },
    negative_examples: req.files.negative_examples.data
  };

  const vr = visualRecognition.createClassifier(createClassifierParams)
    .then(function(classifier) {
        console.log('CLassifier THEN', classifier);
        res.json(classifier);
    })
    .catch(function(err) {
        console.log('error Catch:', err);
        res.json({ err })
    });

In the snippet above, req.files.positive_examples.data and req.files.negative_examples.data is the buffer I posted in the original post. The entire req.files provided by express-fileupload middleware looks like this:

{ positive_examples:
   { name: 'dogs.zip',
     data:
      <Buffer 50 4b 03 04 14 00 00 00 08 00 20 a5 45 4f 96 07 ae 17 f2 10 00 00 22 11 00 00 06 00 00 00 64 32 2e 6a 70 67 95 55 77 50 d3 5d d3 fd 85 22 88 4a 35 74 ... >,
     size: 71052,
     encoding: '7bit',
     tempFilePath: '',
     truncated: false,
     mimetype: 'application/x-zip-compressed',
     md5: 'c6fce52542db9e1b5b5a7db92546baba',
     mv: [Function: mv] },
  negative_examples:
   { name: 'cats.zip',
     data:
      <Buffer 50 4b 03 04 14 00 00 00 08 00 04 6f 49 4f bf 9a 03 10 08 18 00 00 b0 18 00 00 07 00 00 00 63 61 74 2e 6a 70 67 75 58 07 50 13 ce d2 4f 68 51 7a 91 16 ... >,
     size: 66699,
     encoding: '7bit',
     tempFilePath: '',
     truncated: false,
     mimetype: 'application/x-zip-compressed',
     md5: '1223e2fb46a23789ee3d106ba4190748',
     mv: [Function: mv] } }

And the response on the front end I get is

Object
classes: Array(1)
0:
class: "dalm"
__proto__: Object
length: 1
__proto__: Array(0)
classifier_id: "Dogs_1363139155"
core_ml_enabled: true
created: "2019-10-11T16:05:35.396Z"
name: "Dogs"
owner: "....”
status: "training"
updated: "2019-10-11T16:05:35.396Z"
__proto__: Object

However the model in Watson Studio says failed.

dpopp07 commented 4 years ago

@henhen87 So, I think this is a bug in the SDK. I think the service is expecting a filename, but we aren't exposing a filename property in the SDK. It works with streams because the SDK reads the filename from streams in the background. Of course, there is no way to read a filename from a Buffer so I will release a patch to v4 of the SDK and add a filename field.

cpphen commented 4 years ago

Hello @dpopp07 ,

I see. In that case for V3, would the only way to currently have file uploads from the front end to back end work be to write the files in the local file system in server then read that file / make a readable stream from it or use a cloud storage such as a bucket?

germanattanasio commented 4 years ago

I think we added a fix for this long time ago @dpopp07. @ammardodin did all the work to work with streams.

cpphen commented 4 years ago

@germanattanasio Hello, would you happen to have a reference of where this was raised and fixed? I am not sending a stream with fs.createReadStream because I am trying to send the file and its contents without having to write it to the local file system then read/create a readable stream of it. Instead from my understanding, buffers are a temporary memory store that hold data while it is being transferred across networks/streams, but it seems to not be working.

In the docs, it says the acceptable formats are

Record<string, NodeJS.ReadableStream | Buffer>
germanattanasio commented 4 years ago

I think it was something like:

{
  value: new Buffer(base64json,"base64"),//base64data - is base64 encoded file content
  options: {
    filename: 'working1.pdf'
  }
}

Try with:

const createClassifierParams = {
  name: 'Dogs',
  positive_examples: {
    dalm_positive_examples: {
     value: req.files.positive_examples.data
     options: {
      filename: 'positive.zip'
     }
    },
  },
  negative_examples: {
    value: req.files. negative_examples.data
    options: {
      filename: 'negative.zip'
    }
}
};

visualRecognition.createClassifier(createClassifierParams)
  .then(function (classifier) {
    console.log('CLassifier THEN', classifier);
    res.json(classifier);
  })
  .catch(function (err) {
    console.log('error Catch:', err);
    res.json({
      err
    })
  });
dpopp07 commented 4 years ago

I think we added a fix for this long time ago @dpopp07. @ammardodin did all the work to work with streams.

@germanattanasio Yes, it works with Streams because we can infer the filename. The above code won't work with Buffers because the SDK won't even process the filename.

@henhen87 Once I merge #985, you should not have to change your code at all. A dummy filename will be sent in the request, which should be all you need.

cpphen commented 4 years ago

@dpopp07 @germanattanasio

Thank you. I was actually about to message you if the changes in #985 was ready for testing. Once it is, will that be included in V3?

Also tried @germanattanasio solution and that seems to work fine. The documents for Create Classifier Node version doesn't seem to cover these optional properties, options and filename, but it works regradless.

dpopp07 commented 4 years ago

Ah yes, looking again at the code @germanattanasio posted, that should work fine. My mistake.

However, I think it is preferable not to have to set up those options in your client code. The patch has now been released in v4.5.4. Try installing and testing with that version

germanattanasio commented 4 years ago

@dpopp07 I think we want users to specify the filename if they want. Services like discovery use the name as metadata and users can search for it. You can't hardcode a filename like in #985

mkistler commented 4 years ago

This is a special case for VR createClassifier. For other services with generate an associated xFilename parameter.

Please do not use the options approach. It is not documented and we actually want to remove that at some point in the future.

cpphen commented 4 years ago

Hello,

Yes, @mkistler, I was unaware of the options approach until @germanattanasio had mentioned it, since it is as you mentioned not documented.

I just tried @dpopp07 changesets in #985 and it still seems to not be working. I still get failed in my Watson Studio when I send the data in this format

const createClassifierParams = {
    name: 'Dogs5',
    positive_examples: {
      dalm_positive_examples: req.files.positive_examples.data,
    },
    negative_examples: req.files.negative_examples.data
  };

The reason I am sending it in this format is I have got the following error a couple times when using other types of formats, including that shown at the end of this post:

code: 400, message: 'Bad Request', body: '{"error":{"code":400,"error_id":"input_error","description":"Cannot execute learning task. : need at least 2 _positive_examples fields, (or 1 _positive_examples and 1 negative_examples field) to train a classifier. 1 specified."}}',

and it seems to suggest the format must in this this structure

 {
    name: [Name],
    positive_examples: {
      [Name of Classifier Positive Example]_positive_examples: [Data], //In my case a buffer
      ...
    },
    negative_examples: [Data]
  };

But in the Node JS version of the documentation, it shows an example of an acceptable format using camel case, but it must be under score positive_examples (or negative_examples) followed by nested properties like so: [Name of Classifier Positive Example]_positive_examples

//Node JS Docs example

const createClassifierParams = {
  name: 'dogs',
  negativeExamples: fs.createReadStream('./cats.zip'),
  positiveExamples: {
    beagle: fs.createReadStream('./beagle.zip'),
    husky: fs.createReadStream('./husky.zip'),
    goldenretriever: fs.createReadStream('./golden-retriever.zip'),
  }
};

The example above will not work even when using files in the local file system due to camel case not being an acceptable format.

dpopp07 commented 4 years ago

@henhen87 Hmm not sure why it would not be working. I will investigate further.

As for the docs example - the parameters were converted to use lower camel case in v5. You are using v4, so that does not apply to you but the docs are for the latest version.

germanattanasio commented 4 years ago

@mkistler How can users send a base64 encoded file? or how can they specify the content type? do we expect the service to use the extension for that?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has had no recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

dpopp07 commented 4 years ago

This looks like it may still be an issue

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has had no recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has had no recent activity. It will be closed if no further activity occurs. Thank you for your contributions.