watson-developer-cloud / java-sdk

:1st_place_medal: Java SDK to use the IBM Watson services.
http://watson-developer-cloud.github.io/java-sdk/
Apache License 2.0
594 stars 532 forks source link

Watson Discovery V1 Duplicate Documents Upload #1150

Closed munish-usit closed 3 years ago

munish-usit commented 3 years ago

Overview We are using Watson Discovery V1 document upload API. With this API it is accepting two file name with same name. InputStream documentStream = new ByteArrayInputStream(jsonDocument.getBytes()); createDocumentBuilder = new AddDocumentOptions.Builder(environmentId, collectionId); createDocumentBuilder.file(documentStream) .filename(documentName); .fileContentType(HttpMediaType.APPLICATION_JSON); if(document.getMetadata() != null) createDocumentBuilder.metadata(document.getMetadata().toString());

Expected behavior If filename is same, then it should override the existing document/file in IBM Watson Discovery.

Actual behavior In our case, it is creating a new document in IBM Watson Discovery.

How to reproduce

  1. Use Document Upload API (using Java SDK)
  2. Upload json file with filename "course1.json"
  3. Update json file attributes and upload again.
  4. It will create 2 separate documents in IBM Watson Discovery instead of overriding.

SDK Version `

com.ibm.watson.developer_cloud
        <artifactId>watson-spring-boot-starter</artifactId>
        <version>2.1.4</version>
    </dependency>`

    `<dependency>
        <groupId>com.ibm.watson</groupId>
        <artifactId>discovery</artifactId>
        <version>9.0.2</version>
    </dependency>       
    `

Additional information:

Additional context This issue is not occurring using UI tooling of IBM Watson Discovery. This is occurring using Java Sdk API.

kevinkowa commented 3 years ago

Hi @munish-usit! Did you try using the function updateDocument?

munish-usit commented 3 years ago

Thanks @kevinkowa , I have tried updateDocument, but this require document_id, when we are ingesting content, we are not aware of document_id, so there is overhead of managing and mapping document_id with the content.

I was expecting the same feature as there in UI tooling/interface. In UI tooling , filename is treated as unique and document is overridden based on filename unique parameter.

Thanks, Munish

kevinkowa commented 3 years ago

Hey @munish-usit!

The SDKs are not supporting this particular action, the way to overcome this is to use updateDocument if you know the id for it (map filenames to ids) or you can query the documents and find the id of the one you want to overwrite:

QueryOptions queryOptions = new QueryOptions.Builder(environmentId, collectionId).build();
QueryResponse queryResponse = discovery.query(queryOptions).execute().getResult();
munish-usit commented 3 years ago

Hi @kevinkowa ,

Thanks a lot for clarifying it. Will be using updateDocument approach. Closing the ticket also.

Thanks, Munish