intel-cloud / cosbench

a benchmark tool for cloud object storage service
Other
573 stars 242 forks source link

"Cannot retry request with a non-repeatable request entity" error #289

Open mzhou15 opened 8 years ago

mzhou15 commented 8 years ago

Hi cosbench developers,

We run a cosbench workload(run for 5 minutes) continuously, namely a script resubmits the same workload when the previous run is done. This error on average occurs once a day triggered by a PUT request, and hangs up cosbench. Throughput and bandwidth drop to 0 but the controller webpage shows the workload is running. Any idea about it? The complete error message is pasted as below. Thanks in advance!

2015-12-05 23:06:31,079 [ERROR] [AbstractOperator] - worker 1 fail to perform operation mymixedcontainers_115/mymixedobjects_1871 com.intel.cosbench.api.storage.StorageException: org.apache.http.client.ClientProtocolException at com.intel.cosbench.api.swift.SwiftStorage.createObject(SwiftStorage.java:228) at com.intel.cosbench.driver.operator.Writer.doWrite(Writer.java:98) at com.intel.cosbench.driver.operator.Writer.operate(Writer.java:79) at com.intel.cosbench.driver.operator.AbstractOperator.operate(AbstractOperator.java:76) at com.intel.cosbench.driver.agent.WorkAgent.performOperation(WorkAgent.java:197) at com.intel.cosbench.driver.agent.WorkAgent.doWork(WorkAgent.java:177) at com.intel.cosbench.driver.agent.WorkAgent.execute(WorkAgent.java:134) at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:44) at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:1) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.http.client.ClientProtocolException at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:822) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732) at com.intel.cosbench.client.swift.SwiftClient.storeStreamedObject(SwiftClient.java:252) at com.intel.cosbench.api.swift.SwiftStorage.createObject(SwiftStorage.java:217) ... 12 more Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed. at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:621) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820) ... 16 more Caused by: java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) at java.net.SocketOutputStream.write(SocketOutputStream.java:159) at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:377) at sun.security.ssl.OutputRecord.write(OutputRecord.java:363) at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:849) at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:820) at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:122) at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:153) at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:114) at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:89) at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:96) at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108) at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:120) at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:264) at org.apache.http.impl.conn.AbstractClientConnAdapter.sendRequestEntity(AbstractClientConnAdapter.java:224) at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:255) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647) ... 18 more

mzhou15 commented 8 years ago

Below is the output from cosbench system.log, it shows that cosbench thinks the mission was aborted before it hung:

2015-12-04 01:36:23,781 [INFO] [WorkloadProcessor] - START WORK: main 2015-12-04 01:36:23,787 [INFO] [AbstractCommandTasklet] - time drift between controller and driver-driver1 is 1 mSec 2015-12-04 01:36:23,788 [INFO] [StageRunner] - successfully booted all tasks in stage s1-main 2015-12-04 01:36:23,014 [INFO] [StageRunner] - successfully submitted all tasks in stage s1-main 2015-12-04 01:36:23,016 [INFO] [COSBDriverService] - handler=M86BEBA18A4 2015-12-04 01:36:23,065 [INFO] [MissionHandler] - mission M86BEBA18A4 has been authed successfully 2015-12-04 01:36:23,217 [INFO] [StageRunner] - successfully authenticated all tasks in stage s1-main 2015-12-04 01:36:23,420 [INFO] [StageRunner] - successfully launched all tasks in stage s1-main 2015-12-04 01:42:28,220 [INFO] [MissionHandler] - wait 5 seconds for agents to abort ... 2015-12-04 01:42:29,563 [INFO] [MissionHandler] - all agents have been aborted in mission M86BEBA18A4 2015-12-04 01:42:29,563 [INFO] [MissionHandler] - mission M86BEBA18A4 appears to be aborted

mzhou15 commented 8 years ago

An update about our findings. We had this issue with v0.4.2.c3, But v0.4.1.0 seems fine. The same error occurs once in a while, but v0.4.1.0 can successfully abort the workload. No hanging has been observed for about 30 hours.

ywang19 commented 8 years ago

what's the duration for token expiration? by default, it's one day, which will cause access failure. you may have additional findings from swift side log.

mzhou15 commented 8 years ago

@ywang19 Thanks for replying. Since each workload only lasts for 5 mins and the error occurs roughly after 100 consecutive runs, it shouldn't be related to token expiration. Actually we run the same workload continuously for three swift clusters with identical configurations. Two are fine and have been running for about three weeks, only one has the hanging problem.

An update about running the v0.4.1.0. The cosbensh hung up with exactly the same error after running for 40 hours.

ywang19 commented 8 years ago

What kind of errors at server side on the hanging swift cluster?

From: mzhou15 [mailto:notifications@github.com] Sent: Friday, December 11, 2015 12:32 AM To: intel-cloud/cosbench Cc: Wang, Yaguang Subject: Re: [cosbench] "Cannot retry request with a non-repeatable request entity" error (#289)

@ywang19https://github.com/ywang19 Thanks for replying. Since each workload only lasts for 5 mins and the error occurs roughly after 100 consecutive runs, it shouldn't be related to token expiration. Actually we run the same workload continuously for three swift clusters with identical configurations. Two are fine and have been running for about three weeks, only one has the hanging problem.

An update about running the v0.4.1.0. The cosbensh hung up with exactly the same error after running for 40 hours.

— Reply to this email directly or view it on GitHubhttps://github.com/intel-cloud/cosbench/issues/289#issuecomment-163679325.

mzhou15 commented 8 years ago

@ywang19 Sorry, it may take a while for us to get permission to access the Swift cluster's log. From the cosbench log, the server reset the connection and cosbench supposed to abort the mission but it failed.

I find some comments in the source code cosbench/dev/cosbench-ampli/src/com/intel/cosbench/client/amplistor/AmpliClient.java (pasted below). It is our symptom but our Swift cluster uses swauth, which I assume doesn't use digest authorization.

Anyway, could you talk more about how to implement the workaround of "convert streamed (non-repeatable) entity to self-contained (repeatable)"? Many thanks!

/*
 * If you try to PUT from a stream to a server that uses Digest
 * authorization, the operation will fail, because the authorization
 * handling will cause
 * "org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity."
 * Semi-precise explanation of this issue: Issuing a PUT from a stream leads
 * to the httpclient library using a non-repeatable (streamed) entity. But
 * the authorization process makes the library to repeat the request (1. try
 * unauthorized, 2. server say 401 Unauthorized, use Digest, 3. client
 * retries with Digest, but this will fail, due to the nature of
 * non-repeatable streamed entity).
 * 
 * the workaround is to convert streamed (non-repeatable) entity to
 * self-contained (repeatable).
 */
ywang19 commented 8 years ago

ByteArrayEntity or FileEntiy are repeatable, but they may have performance implications.

From: mzhou15 [mailto:notifications@github.com] Sent: Tuesday, December 15, 2015 1:51 AM To: intel-cloud/cosbench Cc: Wang, Yaguang Subject: Re: [cosbench] "Cannot retry request with a non-repeatable request entity" error (#289)

@ywang19https://github.com/ywang19 Sorry, it may take a while for us to get permission to access the Swift cluster's log. From the cosbench log, the server reset the connection and cosbench supposed to abort the mission but it failed.

I find some comments in the source code cosbench/dev/cosbench-ampli/src/com/intel/cosbench/client/amplistor/AmpliClient.java (pasted below). It is our symptom but our Swift cluster uses swauth, which I assume doesn't use digest authorization.

Anyway, could you talk more about how to implement the workaround(convert streamed (non-repeatable) entity to self-contained (repeatable) )? Many thanks!

— Reply to this email directly or view it on GitHubhttps://github.com/intel-cloud/cosbench/issues/289#issuecomment-164508620.

ywang19 commented 8 years ago

If this is occurring at connection reset, it seems we can ignore it, the modification is to just catch the exception. If you got the exception type, the modification is easy.

From: mzhou15 [mailto:notifications@github.com] Sent: Tuesday, December 15, 2015 1:55 AM To: intel-cloud/cosbench Cc: Wang, Yaguang Subject: Re: [cosbench] "Cannot retry request with a non-repeatable request entity" error (#289)

@ywang19https://github.com/ywang19 Sorry, it may take a while for us to get permission to access the Swift cluster's log. From the cosbench log, the server reset the connection and cosbench supposed to abort the mission but it failed.

I find some comments in the source code cosbench/dev/cosbench-ampli/src/com/intel/cosbench/client/amplistor/AmpliClient.java (pasted below). It is our symptom but our Swift cluster uses swauth, which I assume doesn't use digest authorization.

Anyway, could you talk more about how to implement the workaround of "convert streamed (non-repeatable) entity to self-contained (repeatable)"? Many thanks!

/*

— Reply to this email directly or view it on GitHubhttps://github.com/intel-cloud/cosbench/issues/289#issuecomment-164509594.

mzhou15 commented 8 years ago

@ywang19 I followed your instruction, caught the ClientProtocolException and ignore it, now the cosbench has been running without hanging for almost one month. Thanks very much for your help!