watson-developer-cloud / java-sdk

:1st_place_medal: Java SDK to use the IBM Watson services.
http://watson-developer-cloud.github.io/java-sdk/
Apache License 2.0
593 stars 532 forks source link

[WDS] Error reading the http response #835

Closed asidd closed 6 years ago

asidd commented 6 years ago

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.commons.codec.CharEncoding;
import org.apache.http.util.EntityUtils;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import com.google.gson.Gson;
import com.ibm.watson.developer_cloud.discovery.v1.Discovery;
import com.ibm.watson.developer_cloud.discovery.v1.model.AddDocumentOptions;
import com.ibm.watson.developer_cloud.discovery.v1.model.DeleteDocumentOptions;
import com.ibm.watson.developer_cloud.discovery.v1.model.DocumentAccepted;
import com.ibm.watson.developer_cloud.discovery.v1.model.QueryOptions;
import com.ibm.watson.developer_cloud.discovery.v1.model.QueryOptions.Builder;
import com.ibm.watson.developer_cloud.discovery.v1.model.QueryResponse;
import com.ibm.watson.developer_cloud.discovery.v1.model.QueryResult;
import com.ibm.watson.developer_cloud.discovery.v1.model.UpdateDocumentOptions;
import com.ibm.watson.developer_cloud.http.HttpHeaders;
import com.ibm.watson.developer_cloud.http.HttpMediaType;
import com.ibm.watson.developer_cloud.http.ServiceCall;
import com.ibm.watson.developer_cloud.service.exception.ServiceResponseException;
import com.ibm.watson.developer_cloud.service.exception.TooManyRequestsException;

public class WdsCaller {

    private static Logger log = LogManager.getLogger(WdsCaller.class);
    private static final long WAIT_TIME   = 5000;
    private static final int  RETRY_LIMIT = 10;
    private static final int  RETRY_LIMIT4QUERY = 3;

    private static final WDSProperties property = WDSProperties.getInstance();

    private final Discovery discovery;
    private final Gson gson = new Gson();

    private final String environmentId;
    private final String collectionId;

    public WdsCaller() {
        this.discovery = new Discovery(property.get("wds.versionDate"));
        this.environmentId = property.get("wds.environment.id");
        this.collectionId  = property.get("wds.collection.id");
        init();
    }

    private void init() {
        this.discovery.setEndPoint(property.get("wds.endpoint"));
        this.discovery.setUsernameAndPassword(property.get("wds.user"), property.get("wds.pw"));
        Map<String, String> headers = new HashMap<String, String>();
        headers.put(HttpHeaders.X_WATSON_LEARNING_OPT_OUT,  String.valueOf(true));
        this.discovery.setDefaultHeaders(headers);
    }

    public List<QueryResult> query(QueryOptions queryOptions) throws InterruptedException {
        int retry = 0;

        do {
            try{
                QueryResponse queryResponse = discovery.query(queryOptions).execute();

                return queryResponse.getResults();
            } catch (Exception e) {
                e.printStackTrace();
                //log.warn(e.getMessage());
                Thread.sleep(WAIT_TIME);
            }
        } while (retry++ < RETRY_LIMIT4QUERY);

        throw new Exception("");

    }

    public Builder getQueryBuilder() {
        return new QueryOptions.Builder(environmentId, collectionId);
    }
}

The code I provided is inserted 2.5 seconds sleep between each query to avoid WDS access error.

using WDC Java SDK 4.0.0 Java Runtime v1.8

Error is occurred when query is thrown to WDS via Java API as multiple running. Error detail is like below. [ERROR ] Error reading the http response java.io.EOFException [err] java.lang.RuntimeException: Error reading the http response [err] at com.ibm.watson.developer_cloud.util.ResponseUtils.getString(ResponseUtils.java:110) [err] at com.ibm.watson.developer_cloud.service.WatsonService.getErrorMessage(WatsonService.java:284) [err] at com.ibm.watson.developer_cloud.service.WatsonService.processServiceCall(WatsonService.java:402) [err] at com.ibm.watson.developer_cloud.service.WatsonService$1.execute(WatsonService.java:174)

lpatino10 commented 6 years ago

I ran the provided code using multiple threads to query the Discovery News collection and managed to do it successfully.

With the error given, my assumption is that the problem has something to do with the specific data being queried. It's possible that that returned data is not being handled properly by the SDK, but I can't say for sure without looking at it. If there's any way you could give us a better idea of what that data looks at, it might help diagnose the problem.

lpatino10 commented 6 years ago

After getting some of the data, my first assumption was that the issue had to do with parsing some of it properly. The data contains things like mathematical symbols and non-English characters.

However, I was still able to do a successful sample query, even using multiple threads. So at the moment, I'm still unable to pin down the source of the issue.

asidd commented 6 years ago

@lpatino10

Could you please tell me the items listed below? -The environment that dev team executed the query is the customer dedicated environment? or other public environment? -How many threads are executed at same time. (Our case is 15 threads) -How many results did the each query get from WDS?(Our case is 1000) -Did they execute the queries repeatedly without sleep between each query?(The codes we provided are including 2.5 sec sleep between each query to avoid the error.)

asidd commented 6 years ago

@lpatino10 @germanattanasio - update your progress here please.

lpatino10 commented 6 years ago
  1. I tested my queries above on my own, public environment.
  2. I executed it using 3 threads.
  3. In my first test just making sure that the query() method worked properly using multiple threads, I queried the Discovery News Collection and probably received a couple hundred results back. Using the customer's sample data, I only uploaded a small subset and received no more than 10-15 results back.
  4. I executed the code without sleep between queries.
lpatino10 commented 6 years ago

Closing as this hasn't been able to be reproduced.

asidd commented 6 years ago

@lpatino10 please reopen this as this is not resolved

germanattanasio commented 6 years ago

closing due to inactivity

lpatino10 commented 6 years ago

I've gotten in contact with the customer directly to work on the issue in their environment and will be investigating further.

lpatino10 commented 6 years ago

I've worked with the customer on their Discovery environment to try and pin down the issue, but was still not able to replicate it. If more details surface in the future, feel free to open a new issue about it. For now, I'm closing this one.

lpatino10 commented 6 years ago

I've worked with the customer some more and determined that the error was caused by long query strings resulting in request URLs longer than 2000 characters. Someone from the Discovery team has told me they're aware of the limitation in their current query() implementation but don't have a current timetable for the change.