noobaa / noobaa-core

High-performance S3 application gateway to any backend - file / s3-compatible / multi-clouds / caching / replication ...
https://www.noobaa.io
Apache License 2.0
270 stars 78 forks source link

HPO 659: Mix read/write with md5 checking halts all I/O on all metalLB IP's and does not recover unless noobaa pod restart #6934

Closed MonicaLemay closed 2 years ago

MonicaLemay commented 2 years ago

Environment info

Actual behavior

This is not the same as issue 6930.

In this issue that I am opening, it is true that the node remained in the Ready state so I don't expect any IP failover. This defect is not about metallb IP's not failing over. In this defect, I/O was running to metallb IP 172.20.100.31 which is for node master1. On node master0 , in the CNSA scale core pod, (namespace ibm-spectrum-scale), mmshutdown was issued for just that node. The other nodes remained active and with the filesystem mounted. Master0 has metallb IP 172.20.100.30. There was no I/O going to that IP.

What was observed after mmshutdown on master0 was that all I/O going to 172.20.100.31 stopped. Because of issue 6930, there was no failover. That is fine and expected. But what is not expected is for all I/O to stop.

sh-4.4# date; mmshutdown
Tue Apr  5 17:10:28 UTC 2022
Tue Apr  5 17:10:28 UTC 2022: mmshutdown: Starting force unmount of GPFS file systems
Tue Apr  5 17:10:34 UTC 2022: mmshutdown: Shutting down GPFS daemons
Shutting down!

When mmshutdown was issued, the noobaa endpoint pods only error was Stale file handle

Logs show stale file handle

pod/noobaa-endpoint-7fdb5b75fd-t99nd/endpoint] Apr-5 17:12:26.930 [Endpoint/14] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/s5001b85</Resource><RequestId>l1mefzt6-3wj6yz-8x</RequestId></Error> PUT /s5001b85 {"host":"172.20.100.30","accept-encoding":"identity","user-agent":"aws-cli/2.3.2 Python/3.8.8 Linux/4.18.0-240.el8.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.mb","x-amz-date":"20220405T171225Z","x-amz-content-sha256":"61d056dc66f1882c0f4053be381523a7a28d384abde04fcf5b0021c716bb0ea1","authorization":"AWS4-HMAC-SHA256 Credential=QzhyXj9wVDH9DvnK97L9/20220405/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=6d9fa5c22501bfed4f312ac47621b6cec691bf1cf8f719e8250fcdc0f61522f1","content-length":"154"} Error: Stale file handle
[pod/noobaa-endpoint-7fdb5b75fd-t99nd/endpoint] Apr-5 17:12:27.589 [Endpoint/14]    [L0] core.sdk.bucketspace_nb:: could not create underlying directory - nsfs, deleting bucket [Error: Stale file handle] { code: 'Unknown system error -116' }
[pod/noobaa-endpoint-7fdb5b75fd-t99nd/endpoint] Apr-5 17:12:27.793 [Endpoint/14] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/s5001b85</Resource><RequestId>l1meg1if-7apxvy-1bas</RequestId></Error> PUT /s5001b85 {"host":"172.20.100.30","accept-encoding":"identity","user-agent":"aws-cli/2.3.2 Python/3.8.8 Linux/4.18.0-240.el8.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.mb","x-amz-date":"20220405T171227Z","x-amz-content-sha256":"61d056dc66f1882c0f4053be381523a7a28d384abde04fcf5b0021c716bb0ea1","authorization":"AWS4-HMAC-SHA256 Credential=QzhyXj9wVDH9DvnK97L9/20220405/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=bb34c5aa7ed665ae73a2d172ab33d4056c2611ad2c09331ece80089cff46df05","content-length":"154"} Error: Stale file handle
[root@c83f1-app1 ~

This error is a bit odd because it is on the endpoint pod that was for master0. Master0's metallb IP was 172.20.100.30. Cosbench workload was only set up for 172.20.100.31.

An additional observation is that s3 command for list will work but not for write.

[root@c83f1-dan4 RW_workloads]# date; s5001_2_31 ls
Tue Apr  5 18:33:40 EDT 2022
2022-04-05 18:33:43 s5001b100
2022-04-05 18:33:43 s5001b63
2022-04-05 18:33:43 s5001b62
2022-04-05 18:33:43 s5001b61


[root@c83f1-dan4 RW_workloads]# date;  s5001_2_31 cp alias_commands s3://s5001b1
Tue Apr  5 18:35:36 EDT 2022
^C^Cfatal error:
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "concurrent/futures/thread.py", line 40, in _python_exit
  File "threading.py", line 1011, in join
  File "threading.py", line 1027, in _wait_for_tstate_lock
KeyboardInterrupt
[root@c83f1-dan4 RW_workloads]# date
Tue Apr  5 18:36:43 EDT 2022
[root@c83f1-dan4 RW_workloads]#

All future PUT to 172.20.100.31 and 172.20.100.32, get a timeout (if I don't CTL-C) and the endpoint pods record a "Error: Semaphore Timeout" From 31 and 32, we can do GET and we can read from the Noobaa database. If we rsh into the endpoint pods for the IP's of 172.20100.31 and 172.20.100.32, we see that Spectrum Scale is still mounted in the correct place and we can write to it manually with touch file. So, this tells us that the IP's 31 and 32 are still alive and that the noobaa db is still online. It also tells us that the Spectrum Scale filesystem is still mounted and writable. The timeout on the subsequent PUT tell us that it makes a connection request but never gets a response.

The endpoint pods never restarted and they sill have their labels.

Also, In the scale core pod we run mmhealth node show -N all and we see that everything is HEALTHY, except of course on the one node that we did mmshutdown.

  sh-4.4# mmhealth node show

Node name:      master1-daemon
Node status:    HEALTHY
Status Change:  1 day ago

Component      Status        Status Change     Reasons & Notices
----------------------------------------------------------------
GPFS           HEALTHY       1 day ago         -
GUI            HEALTHY       1 day ago         -
NETWORK        HEALTHY       9 days ago        -
FILESYSTEM     HEALTHY       9 days ago        -
NOOBAA         HEALTHY       26 min. ago       -
PERFMON        HEALTHY       1 day ago         -
THRESHOLD      HEALTHY       9 days ago        -
sh-4.4# set -o vi
sh-4.4# mmhealth node show -N all

Node name:      master0-daemon
Node status:    FAILED
Status Change:  36 min. ago

Component      Status        Status Change     Reasons & Notices
--------------------------------------------------------------------------------
GPFS           FAILED        36 min. ago       gpfs_down, quorum_down
NETWORK        HEALTHY       1 day ago         -
FILESYSTEM     DEPEND        36 min. ago       unmounted_fs_check(remote-sample)
PERFMON        HEALTHY       1 day ago         -
THRESHOLD      HEALTHY       1 day ago         -

Node name:      master1-daemon
Node status:    HEALTHY
Status Change:  1 day ago

Component      Status        Status Change     Reasons & Notices
----------------------------------------------------------------
GPFS           HEALTHY       1 day ago         -
GUI            HEALTHY       1 day ago         -
NETWORK        HEALTHY       9 days ago        -
FILESYSTEM     HEALTHY       9 days ago        -
NOOBAA         HEALTHY       27 min. ago       -
PERFMON        HEALTHY       1 day ago         -
THRESHOLD      HEALTHY       9 days ago        -

Node name:      master2-daemon
Node status:    HEALTHY
Status Change:  1 day ago

Component       Status        Status Change     Reasons & Notices
-----------------------------------------------------------------
CALLHOME        HEALTHY       9 days ago        -
GPFS            HEALTHY       1 day ago         -
NETWORK         HEALTHY       9 days ago        -
FILESYSTEM      HEALTHY       9 days ago        -
GUI             HEALTHY       3 days ago        -
HEALTHCHECK     HEALTHY       9 days ago        -
PERFMON         HEALTHY       9 days ago        -
THRESHOLD       HEALTHY       9 days ago        -

Something is obviously hung in the PUT connection but logs and noobaa health don't point to anything. When we issue mmstartup the PUT's still fail. The only way to recover is to delete the Noobaa endpoint pods and have new ones generated again.

I have been able to recreate this very easily so if it is required I can set this up on my test stand

Expected behavior

1.When doing mmshutdown on one node, it should not impact cluster wide I/O capability. It should not be an outage. If indeed an outage is expected, then mmstartup should recover I/O capability.

Steps to reproduce

  1. Start a cosbench run. I can provide the xml if needed. Once I/O is running, issue mmshutdown from within one CNSA Ibm Spectrum Scale core pod.

More information - Screenshots / Logs / Other output

Must gather and noobaa diagnose in https://ibm.ent.box.com/folder/145794528783?s=uueh7fp424vxs2bt4ndrnvh7uusgu6tocd

This issue started as HPO https://github.ibm.com/IBMSpectrumScale/hpo-core/issues/659 Screeners determined that it was with Nooobaa. I have also slacked the CNSA team for input but have not heard back.

MonicaLemay commented 2 years ago

I need to amend the discovery of this defect. It has nothing to do with CNSA mmshutdown on a scale core pod. The trigger for this is to have s3 service with MD5 enabled and cosbench workload with mixed read and write while doing hashCheck=true.

[root@c83f1-dan4 ~]# cat /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5001_GB_random_size_num_of_objects.xml
<?xml version="1.0" encoding="UTF-8" ?>
<workload name="GBrandomsizenumobj" description="GBrandom read and write workload">
  <storage type="s3" config="accesskey=QzhyXj9wVDH9DvnK97L9;secretkey=xTwEhfGB/wRXZbk7Ymsq2fYwl52AriraQPVzrZKF;path_style_access=true;endpoint=http://metallb2:80"/>

  <workflow>

   <workstage name="prepare_rGB">
      <work name="prepare_rGB" workers="256" interval="10" rampup="0" type="prepare" ratio="100" config="cprefix=s5001b;oprefix=s5001o;osuffix=_rGB;containers=r(1,4);objects=r(1,20);sizes=r(2,3)GB;hashCheck=true"/>
    </workstage>

    <workstage name="rGB_write_read_delete">
      <work name="rGB_write_read_delete" workers="256" interval="10" rampup="0" runtime="300">
        <operation type="write" ratio="50" config="cprefix=s5001b;oprefix=s5001o;osuffix=_rGB;containers=r(5,6);objects=r(1,20);sizes=u(2,3)GB;hashCheck=true"/>
        <operation type="read" ratio="50" config="cprefix=s5001b;oprefix=s5001o;osuffix=_rGB;containers=r(1,2);objects=r(1,20);hashCheck=true"/>
      </work>

  </workflow>
</workload>

I will see that the cosbench workload fails. When I fails, a manual s3 cp times out, while s3 ls still works.

MonicaLemay commented 2 years ago

Can we make this a sev 1 defect? I think this is a critical problem for us. Could this be a noobaa db deadlock. I don't know the commands to check for noobaa db deadlock

romayalon commented 2 years ago

@MonicaLemay Are you able to curl to 172.20.100.31/172.20.100.32?

MonicaLemay commented 2 years ago

Curl works when the condition exists. The IP's are reachable.

[root@c83f1-dan4 ~]# curl 172.20.100.31
<?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><Resource>/</Resource><RequestId>l1w6sj05-996mt5-tt8</RequestId></Error>[root@c83f1-dan4 ~]# [

Also s3 ls will work and will list the buckets. The s3 cp times out so no PUT works.

romayalon commented 2 years ago

@MonicaLemay

Few observations:

  1. The endpoint pods that are attached to master0 are: noobaa-endpoint-7fdb5b75fd-t99nd, noobaa-endpoint-7fdb5b75fd-5k4c2

  2. Stale file handle error - I searched for it in all the endpoint pods logs and could only find it in these: noobaa-endpoint-7fdb5b75fd-t99nd, noobaa-endpoint-7fdb5b75fd-5k4c2 - which are the pods that are attached to master0. according to google:

    A filehandle becomes stale whenever the file or directory referenced by the handle is removed by another host, while your client still holds an active reference to the object

    I feel this stale file handle error is expected, but maybe i'm missing something?

  3. I searched for alias_commands (the object that you tried to cp) in all endpoint pods logs and couldn't find it - I suspect NooBaa never got this request and that's why the response never return and you get a timeout - NSFS debug level would help with checking it.

  4. The fact that ls is working doesn't really say anything because NooBaa throws Slow Down and it's not a timeout.

  5. cosbench failing on PUT requests - on the functional endpoint pods I see that NooBaa returns Slow down - which makes sense to me because now there are only 4 endpoint pods, each has 3Gi memory - and 0.75 Gi is the limit for NSFS buffers. I see that you are uploading large files, can you try it with smaller files and check if you still get this Slow Down error?

MonicaLemay commented 2 years ago

@romayalon Thanks Romy for the comment. I'm not as worried about Stale File handle and Cosbench slow down. What concerns me the most is after I hit this condition, I kill cosbench workload. After that I cannot do s3 cp . I can wait 12 hours and still cannot do an s3 cp file. I have to restart the noobaa pods. After I restart noobaa pods, I can do s3 cp

romayalon commented 2 years ago

@MonicaLemay So can you please check again the cp command connection details (specifically the endpoint address) and check if the requests were received in the endpoint pods? A Higher debug level is needed here probably.

MonicaLemay commented 2 years ago

Did some follow-up investigation asked for by Romy. In the process, I have refined the trigger. So, in summary 5 things:

  1. Cosbench will give a java error when the cosbench xml file has hashcheck=true AND file sizes are 3gig or bigger. In some ways, I don't consider this to be a noobaa problem . This is probably something with cosbench. When the java error occurs the file does not get sent. This can't be noobaa's fault. I don't think we need to persue the java errors from cosbench. When files are 3 gig and bigger but hashCheck is not set everything works fine.

  2. Cosbench termination of workload or lettting it complete, w/out hashcheck=true AND 3 gig or bigger is OK. s3 cp file works
 and there are no java errors in the cosbench logs.


3. Cosbench w/ hashcheck=true AND files 1 - 2 Gig will causes s3 cp file to timeout. This is the error that this defect needs to persue. 4 . I checked to make sure that Cosbench is not holding the port open or blocked. I stopped cosbench on all drivers and still could not s3 cp file
.

  1. If I remove the read from the Cosbench workload xml file, then s3 cp file will work. It appears to NOT be a function of file size but doing both reads (GET) and writes (PUT) with hashCheck=true .

The best xml file to reproduce this quickly is:

[root@c83f1-dan4 RW_workloads]# cat /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5002_GB_random_size_num_of_objects.xml
<?xml version="1.0" encoding="UTF-8" ?>
<workload name="GBrandomsizenumobj" description="GBrandom read and write workload">
  <storage type="s3" config="accesskey=qiIImaITeJCdkdABWAsk;secretkey=c80yUC5QfG9sxK99XQI70zXjpin9cqIeSR54r47c;path_style_access=true;endpoint=http://metallb2:80"/>

  <workflow>

   <workstage name="prepare_rGB">
      <work name="prepare_rGB" workers="256" interval="10" rampup="0" type="prepare" ratio="100" config="cprefix=s5002b;oprefix=s5002o;osuffix=_rGB;containers=r(1,2);objects=r(1,3);sizes=r(1,2)GB;hashCheck=true"/>
    </workstage>

    <workstage name="rGB_write_read_delete">
      <work name="rGB_write_read_delete" workers="256" interval="10" rampup="0" runtime="300">
        <operation type="write" ratio="20" config="cprefix=s5002b;oprefix=s5002o;osuffix=_rGB;containers=r(5,6);objects=r(1,20);sizes=r(1,2)GB;hashCheck=true"/>
        <operation type="read" ratio="80" config="cprefix=s5002b;oprefix=s5002o;osuffix=_rGB;containers=r(1,2);objects=r(1,3);hashCheck=true"/>
      </work>
    </workstage>

  </workflow>
</workload>

One test was done with log level nsfs. Noobaa diagnose and must-gather are colllected and uploaded to our box note.

root@c83f1-dan4 RW_workloads]# date ; sh /root/cosbench/cli.sh submit /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5002_GB_random_size_num_of_objects.xml
Wed Apr 13 15:34:25 EDT 2022
Accepted with ID: w967
[root@c83f1-dan4 RW_workloads]#




The hang occurred around
Wed Apr 13 15:36:10 EDT 2022 Grabbed logs

Now try a new file name


[root@c83f1-dan4 RW_workloads]# date; s5001_2_31 cp good.xml s3://s5001b1
Wed Apr 13 15:38:33 EDT 2022

NOW LOG LEVEL ALL



[root@c83f1-dan4 RW_workloads]# date ; sh /root/cosbench/cli.sh submit /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5002_GB_random_size_num_of_objects.xml
Wed Apr 13 15:55:08 EDT 2022
Accepted with ID: w969

Now try to s3 cp a file

[root@c83f1-dan4 RW_workloads]# date; s5001_2_31 cp nohup.out s3://s5001b1
Wed Apr 13 15:56:12 EDT 2022
upload: ./nohup.out to s3://s5001b1/nohup.out
[root@c83f1-dan4 RW_workloads]# date; s5001_2_31 cp rand_read.xml s3://s5001b1
Wed Apr 13 15:57:02 EDT 2022

In all cases oc logs does not show any errors.

guymguym commented 2 years ago

@MonicaLemay This is a great elimination!

I probably missed it mentioned above somewhere - how much time, or which actions, it takes to recover from this situation?

Few suggestions:

MonicaLemay commented 2 years ago

@guymguym @romy

  1. When the condition occurs it is only on metallb IP 172.20.100.31 and 172.20.100.32. The metallb IP address 172.20.100.33 does not exhibiti the problem. The reason for this is b/c my cosbench set up only has 2 drivers. I don't have 3 drivers. Driver 1 points to 172.20.100.31 and driver 2 points to 172.20.100.32. It appears it only occurrs on IP's that had simultaneous read/write with hashCheck=true.

-When the condtion occurs s3 mb and rb works. Only PUT and GET fail to the 2 IP's that were doing cosbench simultaneous read/write with hashCheck=true.

Screen Shot 2022-04-14 at 2 47 40 PM

aws-debug-32.docx

MonicaLemay commented 2 years ago

I can provide IP addresses for storage cluster, openshift cluster and appllication nodes if you noobaa team would like to debug on my cluster.

guymguym commented 2 years ago

@MonicaLemay Thanks!

MonicaLemay commented 2 years ago

@guymguym

One more comment - I cannot recreate this with warp. I just did a wrap run and did not see the problem. Warp and cosbench have different ways of doing tests. I think it is an important data point but not enought to yet blame cosbench as being defective

Please let me know if you need to meet with me. I can clarify all of your questions very quickly.

romayalon commented 2 years ago

Hi @MonicaLemay Can you please share the cosbench log so I'll be able to see which object failed and what was the error? if possible please try to minimize the ops in the cosbench XML so it'll still fail, for instance, if it fails on prepare, so remove the rest of the script.

This is what I found regarding good.xml - I see 3 retries of good.xml upload at 19:38:40, 19:39:40, 19:40:40 and eventually it fails with SlowDown error code.

[Endpoint/14][ERROR]core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>SlowDown</Code><Message>Reduce your request rate.</Message><Resource>/s5001b1/good.xml</Resource><RequestId>l1xz6wds-8ditak-w2m</RequestId></Error> PUT /s5001b1/good.xml {"host":"172.20.100.31","accept-encoding":"identity","content-type":"text/xml","user-agent":"aws-cli/2.3.2 Python/3.8.8 Linux/4.18.0-240.el8.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.cp","content-md5":"925sPKSZMpLjKyetdX660g==","expect":"100-continue","x-amz-date":"20220413T193840Z","x-amz-content-sha256":"66b57d67a15890b6985c286b56f6e135daae065aa32ea33a06d2ece990c59909","authorization":"AWS4-HMAC-SHA256 Credential=QzhyXj9wVDH9DvnK97L9/20220413/us-east-1/s3/aws4_request, SignedHeaders=content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=a38ec935293473e1a995f9a4d5f8ce79c3b1e0a2ef9e80f63da4d356d22ec03b","content-length":"1429"} Error: Semaphore Timeout
    at Semaphore._on_timeout (/root/node_modules/noobaa-core/src/util/semaphore.js:248:25)
    at Timeout.<anonymous> (/root/node_modules/noobaa-core/src/util/semaphore.js:244:53)
    at listOnTimeout (internal/timers.js:557:17)
    at processTimers (internal/timers.js:500:7)

And the same for rand_read.xml I see a SlowDown error-

[Endpoint/14][ERROR]core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>SlowDown</Code><Message>Reduce your request rate.</Message><Resource>/s5001b1/rand_read.xml</Resource><RequestId>l1xzx9dk-6nkiq1-1c7f</RequestId></Error> PUT /s5001b1/rand_read.xml {"host":"172.20.100.31","accept-encoding":"identity","content-type":"text/xml","user-agent":"aws-cli/2.3.2 Python/3.8.8 Linux/4.18.0-240.el8.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.cp","content-md5":"nC9U9rHx1mvrQLCpEzfLzg==","expect":"100-continue","x-amz-date":"20220413T195910Z","x-amz-content-sha256":"bb2b3ecd206acbf0a1ba55d0036bff465d40e3cdc7bdb0e0d9f5ce8e14b0c63c","authorization":"AWS4-HMAC-SHA256 Credential=QzhyXj9wVDH9DvnK97L9/20220413/us-east-1/s3/aws4_request, SignedHeaders=content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=2223257b97e704b739cbb6610fab5e8687d4de3f488afe477618c4208b4bd7d9","content-length":"1411"} Error: Semaphore Timeout
    at Semaphore._on_timeout (/root/node_modules/noobaa-core/src/util/semaphore.js:248:25)
    at Timeout.<anonymous> (/root/node_modules/noobaa-core/src/util/semaphore.js:244:53)
    at listOnTimeout (internal/timers.js:557:17)
    at processTimers (internal/timers.js:500:7)
guymguym commented 2 years ago

@MonicaLemay Do we use any custom env overrides for NSFS_BUF_POOL_MEM_LIMIT? This is the initial value used by the buffers pool semaphore, and even if we don't set it explicitly it is calculated as 25% of the container memory limit, but in previous tests with Silvana we used to experiment with this setting:

kubectl set env deployment/noobaa-endpoint CONFIG_JS_NSFS_BUF_POOL_MEM_LIMIT=2147483648

There is a log print of the semaphore and buffers pool initial values here so we can check the actual value used.

@romayalon I see we left the upload_object flow with a non-accurate update to the semaphore because we don't have full visibility to the incoming http request buffers being allocated by the underlying nodejs http library. So in our code we surround the upload with just a single 8MB acquired from the semaphore in the assumption that this represents a rough estimate of the memory we use during the flow, but we know that nodejs streams can take more memory while flowing. I would try to create a stress test case that we can run locally and see if the same effect can be observed. Of course the root cause might be elsewhere still.

Here is where we acquire the semaphore on upload - https://github.com/noobaa/noobaa-core/blob/f50464d21579358c41aa6f3726219b2aff159019/src/sdk/namespace_fs.js#L687-L692

We might also want to look for the timeout warning prints that should give us the stack trace of the caller more accurately (we added those when we were debugging read flow semaphore cases) - https://github.com/noobaa/noobaa-core/blob/f50464d21579358c41aa6f3726219b2aff159019/src/util/semaphore.js#L84-L88

romayalon commented 2 years ago

@guymguym Sure I'll try to create a stress test.. Don't you think that the fact that it only happens when hashCheck=true is relevant information?

guymguym commented 2 years ago

@romayalon If I understand this hashCheck=true mode correctly - this has no effect on the endpoint behavior, it only changes how the cosbench workers behaves, right? It might have a side effect on scheduling due to the workers having more cpu work between requests. But it is an interesting effect to analyze.

MonicaLemay commented 2 years ago

@romayalon I have uploaded the cosbench logs. Apologies. You had asked for them and I forgot to bundle them with the noobaa logs. They are in https://ibm.ent.box.com/folder/145794528783?s=uueh7fp424vxs2bt4ndrnvh7uusgu6to

There are 2. One is from the run 967 which had NSFS debug level The other is from run 969 which had ALL debug level.

The xml that I have posted in an above thread where I give info about run 967 and 969 is the shortest and best one. I'll copy from above and put it here so that you don't have to fish for it.

[root@c83f1-dan4 RW_workloads]# cat /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5002_GB_random_size_num_of_objects.xml
<?xml version="1.0" encoding="UTF-8" ?>
<workload name="GBrandomsizenumobj" description="GBrandom read and write workload">
  <storage type="s3" config="accesskey=qiIImaITeJCdkdABWAsk;secretkey=c80yUC5QfG9sxK99XQI70zXjpin9cqIeSR54r47c;path_style_access=true;endpoint=http://metallb2:80"/>

  <workflow>

   <workstage name="prepare_rGB">
      <work name="prepare_rGB" workers="256" interval="10" rampup="0" type="prepare" ratio="100" config="cprefix=s5002b;oprefix=s5002o;osuffix=_rGB;containers=r(1,2);objects=r(1,3);sizes=r(1,2)GB;hashCheck=true"/>
    </workstage>

    <workstage name="rGB_write_read_delete">
      <work name="rGB_write_read_delete" workers="256" interval="10" rampup="0" runtime="300">
        <operation type="write" ratio="20" config="cprefix=s5002b;oprefix=s5002o;osuffix=_rGB;containers=r(5,6);objects=r(1,20);sizes=r(1,2)GB;hashCheck=true"/>
        <operation type="read" ratio="80" config="cprefix=s5002b;oprefix=s5002o;osuffix=_rGB;containers=r(1,2);objects=r(1,3);hashCheck=true"/>
      </work>
    </workstage>

  </workflow>
</workload>

@guymguym In my testing it is customer level testing. I am not allowed to modify anything other than if it is provided by our command line interface. In other words, I cannot modify memory pool buffers by editing deployments/pods. I have to use the default CPU and Memory .

I hope I answered all of your questions.

guymguym commented 2 years ago

Got you @MonicaLemay. Just to clarify I didn't mean to use that as a valid workaround or anything funky like that, only as a debugging technique as we are still chasing the root cause of this issue. But I understand the concern with making these out-of-band changes to a long running system and whether undoing it will actually reset the configuration back correctly. I'll try to come up with more ideas on how to narrow down the problem from your side without making configuration changes, while we try to reproduce it on dev environment. Thanks!

guymguym commented 2 years ago

@MonicaLemay couple more questions:

  1. Did you try this workload with file sizes lower than 1 GB? perhaps a 128-256 MB? I saw what you wrote on > 3 GB with hashCheck=true failing, but I didn't find if you wrote anything on reproducing this issue with lower sizes.
  2. Can share what is that java error from the >3GB case - maybe we can see if this gives any clues?
romayalon commented 2 years ago

@guymguym These are just examples, they appear many times in Cosbench logs: I could find the following warning:

2022-04-13 15:34:59,884 [INFO] [NoneStorage] -performing GET at /s5002b1/s5002o2_rGB2022-04-13 15:34:59,948 [WARN] [AbstractOperator] -No checksum embedded in s5002b1\s5002o3_rGB

And after a while read timeouts started happening:

s5002o3_rGB2022-04-13 15:38:10,934 [WARN] [S3Storage] -below exception encountered when retrieving object s5002o1_rGB at bucket s5002b1: Unable to execute HTTP request: Read timed out2022-04-13 15:38:10,950 [ERROR] [AbstractOperator] -worker 102 fail to perform operation s5002b1/s5002o1_rGBjava.lang.NullPointerExceptionat com.intel.cosbench.driver.operator.Reader.validateChecksum(Reader.java:143)at com.intel.cosbench.driver.operator.Reader.doRead(Reader.java:95)at com.intel.cosbench.driver.operator.Reader.operate(Reader.java:69)at 

And for s5002b2/s5002o3_rGB:

2022-04-13 15:36:02,618 [ERROR] [AbstractOperator] -worker 129 fail to perform operation s5002b2/s5002o3_rGBjava.net.SocketTimeoutException: Read timed outat java.net.SocketInputStream.socketRead0(Native Method)at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)at java.net.SocketInputStream.read(SocketInputStream.java:171)at java.net.SocketInputStream.read(SocketInputStream.java:141)at 

And for s5002b5/s5002o9_rGB: (This time the error code is slowDown and not Einval as we had in the previous bug)

2022-04-13 15:39:02,275 [WARN] [S3Storage] -below exception encountered when creating object s5002o9_rGB at s5002b5: Failed to reset the request input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
2022-04-13 15:38:04,878 [WARN] [S3Storage] -below exception encountered when retrieving object s5002o1_rGB at bucket s5002b2: Unable to execute HTTP request: Read timed out
2022-04-13 15:41:08,451 [WARN] [S3Storage] -below exception encountered when retrieving object s5002o2_rGB at bucket s5002b2: com.amazonaws.http.timers.client.SdkInterruptedException
guymguym commented 2 years ago

@romayalon The first few are GET errors - perhaps I missed it and hashCheck=true actually invokes another read after write and that's what creating more stress?

guymguym commented 2 years ago

OK so I see in cosbench code that hashCheck means:

So the error you found - No checksum embedded in s5002b1\s5002o3_rGB comes from the Reader.validateChecksum() when not finding the expected guard (!!!!) around the hash at the end of the content, see here.

This seems weird indeed - I would try to look at this file from the FS in hex dump (xxd) - perhaps we can see what went wrong in its content. The appended hash should have a guard (!!!!) on both sides so the last bytes of each written object should have this: !!!!<32-hex-md5>!!!! - see HashUtil.

guymguym commented 2 years ago

I wonder why this "no checksum embedded" log was written with "[NoneStorage]" and not "[S3Storage]" like the rest...

romayalon commented 2 years ago

@guymguym yes, so according to the xml and the logs:

  1. Prepare stage - completed successfully - 3 objects (named: s5002o1, s5002o2, s5002o3) were stored in containers s5002b1 and s5002b2 (In total 6 objects of sizes 1/2 GB).
  2. write_read_delete stage - contains only writes and reads: (stage start timestamp -15:34:57,015) 2.1. Write - range of 20 objects are being written to containers s5002b5, s5002b6 (sizes are 1/2 GB) - All writes were successful until 15:39:09,614 when we get the first reset (slowDown) error. Ratio=20 2.2. Read - range of 3 objects (s5002o1, s5002o2, s5002o3) that will be read from containers s5002b1 and s5002b2 (same objects and buckets from prepare stage). Ratio=80 first checksum error started at - 15:34:57,560.

@MonicaLemay do you see the template Guy mentioned !!!!<32-hex-md5>!!!! in these objects? I'm wondering why some get ops are successful and some are not.

[NoneStorage] - most of the log messages are written with this [NoneStorage] in the log.

MonicaLemay commented 2 years ago

When files are size 3 - 4 Gig, No files get completly written:

[root@c83f1-ems s5003]# find ./s500* -type f  | wc -l
29
[root@c83f1-ems s5003]# find ./s500* -type f
./s5003b1/.noobaa-nsfs_625d97a99e2004002a837889/uploads/9c126154-32de-4353-9b0f-abe54be1d09b
./s5003b2/.noobaa-nsfs_625d97b29e2004002a83788c/uploads/bcc7e25a-b0ee-441c-9ecc-2a410ae8f5d0
./s5003b2/.noobaa-nsfs_625d97b29e2004002a83788c/uploads/04daf916-1cee-46d1-842a-4edfae8413c6
./s5003b3/alias_commands
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/1067f77d-96fd-42e1-9f7a-367f76d3fa54
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/4ea1ea57-23f7-4188-a3b6-659c48c6d70d
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/d04bfa57-eb67-4cc4-8ff8-46269272c90d
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/3e26e3f4-6be8-42f7-aa63-ac8f7465b44d
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/e6f23316-8bad-44be-96ff-0e7a4a739a6c
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/715ec2bf-f681-4442-b38a-3e9582f83d0b
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/f9cc167f-6c18-4900-bceb-0c5f0d403747
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/3ad611f7-adeb-49db-aae6-3a4221283976
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/5478b862-1ee6-42e3-9b2a-a17e16e8339b
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/62dd5cae-03b4-425f-af14-aa1b117a73ab
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/b3c2c524-b175-4f49-830c-fc1cabeba17a
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/27e34e18-8cba-44cc-ae60-2f10f3e49adf
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/e45d4c30-c1fc-4e0b-aada-788677f81f9c
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/9c19f8b5-e4c2-43c8-a136-f57f43d6c89d
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/d23fdefa-01c0-49d3-869b-9bd41f3b70cb
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/01bae4e3-65e7-4cd6-a7dd-cf2abaa67ff3
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/01a827ba-5f02-4bdf-bf3b-b1d3c185a6c8
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/928776f9-2787-4ab9-83c2-38c9c1f7a1a5
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/9f7b7aae-9c9c-4bcf-a5ca-208f194b9a61
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/b68acdbb-8b0d-4d20-be2f-01905b834163
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/be4e8952-8e02-4711-aafc-b2dc5866b92a
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/8a6ae8b8-78f2-4648-b77a-67820fd95575
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/8b718d29-d288-4d27-855c-d87e5af2dad4
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/925939f7-bf58-4cc3-9987-a1618f0deff1
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/de49df27-eff6-4683-8d8b-43030c220912
[root@c83f1-ems s5003]#

The cosbench log:

================================================== stage: s1-prepare_rGB ==================================================
---------------------------------- mission: M4432CA8957, driver: dan4 ----------------------------------
2022-04-19 14:54:02,009 [INFO] [Log4jLogManager] - will append log to file /root/cosbench/log/mission/M4432CA8957.log
2022-04-19 14:54:02,277 [INFO] [NoneStorage] - performing PUT at /s5003b1/s5003o3_rGB
2022-04-19 14:54:02,277 [INFO] [NoneStorage] - performing PUT at /s5003b1/s5003o1_rGB
2022-04-19 14:54:02,278 [INFO] [NoneStorage] - performing PUT at /s5003b1/s5003o2_rGB
2022-04-19 14:54:02,311 [ERROR] [AbstractOperator] - worker 2 fail to perform operation s5003b1/s5003o2_rGB
java.lang.ArrayIndexOutOfBoundsException
    at com.intel.cosbench.driver.generator.RandomInputStream.processBytes(RandomInputStream.java:106)
    at org.apache.commons.io.input.NullInputStream.read(NullInputStream.java:232)
    at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:75)
    at org.apache.commons.io.input.CountingInputStream.read(CountingInputStream.java:74)
    at com.intel.cosbench.driver.generator.XferCountingInputStream.read(XferCountingInputStream.java:38)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.read(MD5DigestCalculatingInputStream.java:98)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:138)
    at com.amazonaws.http.RepeatableInputStreamRequestEntity.writeTo(RepeatableInputStreamRequestEntity.java:153)
    at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
    at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
    at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
    at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
    at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
    at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
    at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doSendRequest(SdkHttpRequestExecutor.java:63)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
    at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
    at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
    at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
    at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1405)
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1270)
    at com.intel.cosbench.api.S3Stor.S3Storage.createObject(S3Storage.java:167)
    at com.intel.cosbench.driver.operator.Writer.doWrite(Writer.java:98)
    at com.intel.cosbench.driver.operator.Preparer.operate(Preparer.java:97)
    at com.intel.cosbench.driver.operator.AbstractOperator.operate(AbstractOperator.java:76)
    at com.intel.cosbench.driver.agent.WorkAgent.performOperation(WorkAgent.java:197)
    at com.intel.cosbench.driver.agent.WorkAgent.doWork(WorkAgent.java:177)
    at com.intel.cosbench.driver.agent.WorkAgent.execute(WorkAgent.java:134)
    at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:44)
    at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:1)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2022-04-19 14:54:02,315 [INFO] [NoneStorage] - performing PUT at /s5003b2/s5003o2_rGB
2022-04-19 14:54:02,315 [INFO] [NoneStorage] - performing PUT at /s5003b2/s5003o1_rGB
2022-04-19 14:54:02,315 [INFO] [NoneStorage] - performing PUT at /s5003b2/s5003o3_rGB
2022-04-19 14:54:02,318 [ERROR] [ErrorStatistics] - error code: N/A occurred 7 times, fail to operate: s5003b1/s5003o2_rGB, s5003b1/s5003o2_rGB, s5003b1/s5003o3_rGB, s5003b1/s5003o1_rGB, s5003b2/s5003o2_rGB, s5003b2/s5003o1_rGB, s5003b2/s5003o3_rGB
java.lang.ArrayIndexOutOfBoundsException
    at com.intel.cosbench.driver.generator.RandomInputStream.processBytes(RandomInputStream.java:106)
    at org.apache.commons.io.input.NullInputStream.read(NullInputStream.java:232)
    at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:75)
    at org.apache.commons.io.input.CountingInputStream.read(CountingInputStream.java:74)
    at com.intel.cosbench.driver.generator.XferCountingInputStream.read(XferCountingInputStream.java:38)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.read(MD5DigestCalculatingInputStream.java:98)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:138)
    at com.amazonaws.http.RepeatableInputStreamRequestEntity.writeTo(RepeatableInputStreamRequestEntity.java:153)
    at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
    at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
    at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
    at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
    at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
    at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
    at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doSendRequest(SdkHttpRequestExecutor.java:63)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
    at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
    at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
    at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
    at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1405)
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1270)
    at com.intel.cosbench.api.S3Stor.S3Storage.createObject(S3Storage.java:167)
    at com.intel.cosbench.driver.operator.Writer.doWrite(Writer.java:98)
    at com.intel.cosbench.driver.operator.Preparer.operate(Preparer.java:97)
    at com.intel.cosbench.driver.operator.AbstractOperator.operate(AbstractOperator.java:76)
    at com.intel.cosbench.driver.agent.WorkAgent.performOperation(WorkAgent.java:197)
    at com.intel.cosbench.driver.agent.WorkAgent.doWork(WorkAgent.java:177)
    at com.intel.cosbench.driver.agent.WorkAgent.execute(WorkAgent.java:134)
    at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:44)
    at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:1)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
---------------------------------- mission: ME432CA896C, driver: app7 ----------------------------------
2022-04-19 14:54:02,007 [INFO] [Log4jLogManager] - will append log to file /root/cosbench/log/mission/ME432CA896C.log
================================================== stage: s2-rGB_write_read_delete ==================================================
---------------------------------- mission: MB432CD6741, driver: dan4 ----------------------------------
2022-04-19 14:54:14,007 [INFO] [Log4jLogManager] - will append log to file /root/cosbench/log/mission/MB432CD6741.log
2022-04-19 14:54:14,272 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,278 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o2_rGB
2022-04-19 14:54:14,279 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,277 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o9_rGB
2022-04-19 14:54:14,277 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,277 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,276 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o1_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,275 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,275 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,286 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o13_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,274 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,274 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,274 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o17_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,274 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o2_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o3_rGB
2022-04-19 14:54:14,274 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,272 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,292 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,272 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,270 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,272 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,292 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,292 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o4_rGB
2022-04-19 14:54:14,292 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,292 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o1_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o20_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o19_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o18_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o16_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o15_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,286 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o14_rGB
2022-04-19 14:54:14,286 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,286 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,286 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o12_rGB
2022-04-19 14:54:14,284 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o11_rGB
2022-04-19 14:54:14,284 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,284 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,284 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,284 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o10_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,283 [ERROR] [AbstractOperator] - worker 18 fail to perform operation s5003b6/s5003o2_rGB
java.lang.ArrayIndexOutOfBoundsException
    at com.intel.cosbench.driver.generator.RandomInputStream.processBytes(RandomInputStream.java:106)
    at org.apache.commons.io.input.NullInputStream.read(NullInputStream.java:232)
    at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:75)
    at org.apache.commons.io.input.CountingInputStream.read(CountingInputStream.java:74)
    at com.intel.cosbench.driver.generator.XferCountingInputStream.read(XferCountingInputStream.java:38)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.read(MD5DigestCalculatingInputStream.java:98)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:138)
    at com.amazonaws.http.RepeatableInputStreamRequestEntity.writeTo(RepeatableInputStreamRequestEntity.java:153)
    at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
    at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
    at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
    at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
    at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
    at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
    at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doSendRequest(SdkHttpRequestExecutor.java:63)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
    at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
    at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
    at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
    at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1405)
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1270)
    at com.intel.cosbench.api.S3Stor.S3Storage.createObject(S3Storage.java:167)
    at com.intel.cosbench.driver.operator.Writer.doWrite(Writer.java:98)
    at com.intel.cosbench.driver.operator.Writer.operate(Writer.java:79)
    at com.intel.cosbench.driver.operator.AbstractOperator.operate(AbstractOperator.java:76)
    at com.intel.cosbench.driver.agent.WorkAgent.performOperation(WorkAgent.java:197)
    at com.intel.cosbench.driver.agent.WorkAgent.doWork(WorkAgent.java:177)
    at com.intel.cosbench.driver.agent.WorkAgent.execute(WorkAgent.java:134)
    at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:44)
    at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:1)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2022-04-19 14:54:14,339 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o5_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o8_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o7_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o6_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o5_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o4_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o3_rGB
2022-04-19 14:54:14,347 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,346 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o8_rGB
2022-04-19 14:54:14,345 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o7_rGB
2022-04-19 14:54:14,343 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,341 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,352 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o6_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,339 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,339 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,339 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,339 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,339 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,354 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,352 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,351 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,350 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
---------------------------------- mission: M7432CD676E, driver: app7 ----------------------------------
2022-04-19 14:54:14,007 [INFO] [Log4jLogManager] - will append log to file /root/cosbench/log/mission/M7432CD676E.log
2022-04-19 14:54:14,244 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o5_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o7_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o8_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o13_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,246 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o6_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o10_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,245 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,245 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o9_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o8_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o7_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o5_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o4_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,249 [ERROR] [AbstractOperator] - worker 189 fail to perform operation s5003b5/s5003o13_rGB
java.lang.ArrayIndexOutOfBoundsException
    at com.intel.cosbench.driver.generator.RandomInputStream.processBytes(RandomInputStream.java:106)
    at org.apache.commons.io.input.NullInputStream.read(NullInputStream.java:232)
    at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:75)
    at org.apache.commons.io.input.CountingInputStream.read(CountingInputStream.java:74)
    at com.intel.cosbench.driver.generator.XferCountingInputStream.read(XferCountingInputStream.java:38)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.read(MD5DigestCalculatingInputStream.java:98)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
    at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
    at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:138)
    at com.amazonaws.http.RepeatableInputStreamRequestEntity.writeTo(RepeatableInputStreamRequestEntity.java:153)
    at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
    at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
    at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
    at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
    at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
    at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
    at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doSendRequest(SdkHttpRequestExecutor.java:63)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
    at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
    at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
    at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
    at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1405)
    at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1270)
    at com.intel.cosbench.api.S3Stor.S3Storage.createObject(S3Storage.java:167)
    at com.intel.cosbench.driver.operator.Writer.doWrite(Writer.java:98)
    at com.intel.cosbench.driver.operator.Writer.operate(Writer.java:79)
    at com.intel.cosbench.driver.operator.AbstractOperator.operate(AbstractOperator.java:76)
    at com.intel.cosbench.driver.agent.WorkAgent.performOperation(WorkAgent.java:197)
    at com.intel.cosbench.driver.agent.WorkAgent.doWork(WorkAgent.java:177)
    at com.intel.cosbench.driver.agent.WorkAgent.execute(WorkAgent.java:134)
    at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:44)
    at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:1)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o20_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o19_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o18_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o17_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o16_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o15_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o14_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o12_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o11_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o10_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o9_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o6_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o4_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,258 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,256 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,256 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o16_rGB
2022-04-19 14:54:14,256 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,256 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,259 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,255 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,255 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o15_rGB
2022-04-19 14:54:14,255 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,255 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,254 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,254 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o14_rGB
2022-04-19 14:54:14,254 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,254 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o13_rGB
2022-04-19 14:54:14,254 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o12_rGB
2022-04-19 14:54:14,254 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o11_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,260 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,260 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,259 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o18_rGB
2022-04-19 14:54:14,259 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,259 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o17_rGB
2022-04-19 14:54:14,259 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,258 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,260 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,261 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,261 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
MonicaLemay commented 2 years ago

When files are size 100 - 200 MB the condition does not occur. This defect does not happen.

guymguym commented 2 years ago

@MonicaLemay Are you running any special branch/fork of cosbench? or a release? or latest commit from its master branch?

guymguym commented 2 years ago

@MonicaLemay @romayalon Hmmmmmmmmm this RandomInputStream code where you see the ArrayIndexOutOfBoundsException error is definitely looking to me like the root cause of this entire thing. It mixes up 32bit signed integers with 64bit in calculations, which is very well known to be prone to bugs around the 2 GB offset (which is the maximum positive 32bit signed value). I'll try to find the exact problem but I sense that both cases are caused by this code.

https://github.com/intel-cloud/cosbench/blob/68ffaa90e170f87a70c09b5ae187b640eb040264/dev/cosbench-driver/src/com/intel/cosbench/driver/generator/RandomInputStream.java#L81-L129

guymguym commented 2 years ago

Ok I think I found it, and opened an issue and a PR to fix it on cosbench. But since the last commit on cosbench is almost 4 years old, I don't think this will land anytime soon... In any case, we might want to patch our cosbench with this fix to make sure this really solves also our 1-2 GB case. See the PR here - https://github.com/intel-cloud/cosbench/pull/426

MonicaLemay commented 2 years ago

In order to follow up on the hex from the FS, I had to do another run. I had deleted the data from previous runs to do the requested MB and > 3GB tests.

New run:

 date ; sh /root/cosbench/cli.sh submit /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5003_GB_random_size_num_of_objects.xml
Tue Apr 19 15:11:41 EDT 2022
Accepted with ID: w988

From teh cosbench log

2022-04-19 15:13:35,841 [WARN] [AbstractOperator] - No checksum embedded in s5003b2\s5003o2_rGB
2022-04-19 15:13:35,842 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB

From the FS


[root@c83f1-ems s5003b2]# pwd ; ls -ail
/gpfs/hpofs1/s5003/s5003b2
total 4882834
8406077 drwxr-x---.  4 5003 5003       8192 Apr 19 15:13 .
  69634 drwxr-xr-x. 52 5003 5003       4096 Mar  7 19:59 ..
4639305 drwxr-x---.  3 5003 5003       4096 Mar  3 17:20 .noobaa-nsfs_622115c8c8206400294c9c68
4200092 drwxr-x---.  3 5003 5003       4096 Apr 18 13:33 .noobaa-nsfs_625d97b29e2004002a83788c
8485394 -rw-r-----.  1 5003 5003 2000000000 Apr 19 15:13 s5003o1_rGB
8485396 -rw-r-----.  1 5003 5003 2000000000 Apr 19 15:13 s5003o2_rGB
8485395 -rw-r-----.  1 5003 5003 1000000000 Apr 19 15:12 s5003o3_rGB

I'm not sure what to look at with the xxd. I am looking at file s5003o2_rGB in bucket s5003b2. The last few lines from xxd is this:

[root@c83f1-ems ~]# tail -20 /tmp/out
773592c0: 716f 6968 637a 7878 7962 6b6d 7579 6a72  qoihczxxybkmuyjr
773592d0: 7476 7168 7871 666f 7a6c 7871 656a 7272  tvqhxqfozlxqejrr
773592e0: 6f64 777a 6d72 746a 6b61 7073 666b 796f  odwzmrtjkapsfkyo
773592f0: 6f70 646f 7973 7866 6b77 6661 6b66 6c61  opdoysxfkwfakfla
77359300: 6f78 6666 616a 6579 6666 6c64 6c7a 7567  oxffajeyffldlzug
77359310: 7264 6373 6462 7869 7561 7765 676a 6f74  rdcsdbxiuawegjot
77359320: 7269 7869 7371 6c76 6267 616e 6666 6b73  rixisqlvbganffks
77359330: 7964 7071 6f75 6971 6166 666b 7068 7270  ydpqouiqaffkphrp
77359340: 7766 726f 7a6c 6178 6a6c 7677 7077 7571  wfrozlaxjlvwpwuq
77359350: 6976 6b73 7662 7872 6566 7568 6667 6878  ivksvbxrefuhfghx
77359360: 6672 7771 6764 6b6e 7778 6273 6168 7876  frwqgdknwxbsahxv
77359370: 6b63 7468 6e77 796f 6162 6670 7276 7561  kcthnwyoabfprvua
77359380: 7069 6d71 7779 7366 776d 7468 646a 7573  pimqwysfwmthdjus
77359390: 7571 6362 7775 716f 6963 6b65 676f 6c66  uqcbwuqoickegolf
773593a0: 6365 6c70 6367 6f70 6e6b 7064 7162 6a73  celpcgopnkpdqbjs
773593b0: 6272 7272 6a6d 7575 636d 6b6b 7a71 6669  brrrjmuucmkkzqfi
773593c0: 6b6d 626e 7179 7877 6d6b 6e63 6562 706e  kmbnqyxwmkncebpn
773593d0: 7575 637a 626e 6463 2121 2121 3031 3534  uuczbndc!!!!0154
773593e0: 3239 6163 3135 6239 3339 6130 6663 3934  29ac15b939a0fc94
773593f0: 3863 3635 6532 3037 3861 6166 2121 2121  8c65e2078aaf!!!!
[root@c83f1-ems ~]#
guymguym commented 2 years ago

@MonicaLemay Thanks! So what we can see in the hex dump is that it ends with a !!!! + 32 hex chars + !!!! as expected -

773593d0: 7575 637a 626e 6463 2121 2121 3031 3534  uuczbndc!!!!0154
773593e0: 3239 6163 3135 6239 3339 6130 6663 3934  29ac15b939a0fc94
773593f0: 3863 3635 6532 3037 3861 6166 2121 2121  8c65e2078aaf!!!!

This means that in this case, the write itself actually finished fine, which makes sense because the size is 2,000,000,000 bytes which is 140 MB less than 2 GB so it shouldn't have caused int overflow.

However when I look at the Reader validateChecksum() code here - https://github.com/intel-cloud/cosbench/blob/68ffaa90e170f87a70c09b5ae187b640eb040264/dev/cosbench-driver/src/com/intel/cosbench/driver/operator/Reader.java#L158-L163

I get a strong feeling that I see another bug since the code ignores the bytes in buf2. I will see if I can prove that this is the case and submit an issue+pr to cosbench.

romayalon commented 2 years ago

@guymguym Awesome! So you think that these failures cause retries and that's what creates more stress?

MonicaLemay commented 2 years ago

@guymguym @romayalon Thanks so much for all of the work on this. I think I got a little bit lost in the discussions above. It sounds like the issue that I found is not a noobaa bug but all of the problems are in cosbench??? Is that correct?? Is there still a noobaa problem that is being investigated?? Please help me by clarifying . Thanks again.

guymguym commented 2 years ago

Hey so I think it's still hard for me to determine.

  1. The first PR I opened to cosbench on RandomInputStream (Writer) can explain the missing checksum for files >2GB and is definitely a cosbench issue (although I doubt that it is going to merge anything new after 4 years of silence).
  2. The second case for <2GB I still haven't managed to find an exact bug in cosbench Reader that would result in those "No checksum embedded in bucket/key" while the files actually have it - although I did see some questionable cases in the cosbench code that might be causing it.
  3. I am not sure if those cases are really causing the timeouts - I was looking into those hoping it's related to the changes of hashCheck=true, but I'm not sure yet.

What I wanted to suggest was to try and build cosbench with the fix in https://github.com/intel-cloud/cosbench/pull/426 to see if this eliminates any of those cases.

The initial approach we discussed here was to add custom instrumentation to the noobaa endpoints code to be able to pin point bottlenecks that could cause those timeouts.

Both directions would make positive progress - WDYT? any preference?

MonicaLemay commented 2 years ago

@guymguym I looked at the instructions to build cosbench from the PR https://github.com/intel-cloud/cosbench/blob/master/BUILD.md

Unfortunately, our team may be deciding to not invest in cosbench as a system test tool and we will not be building it. Should this defect proceed w/out the cosbench fix or should we now close this defect if you think everything we are seeing is a cosbench problem??

guymguym commented 2 years ago

@MonicaLemay Thanks. I do believe that the fact that you had to restart the endpoints is still an issue here. The fact that we observed this no-service issue only with (potentially buggy) cosbench with hashCheck=true, means there is a pathological case in the noobaa code. Perhaps we still have an issue with promises and the semaphore as we had in #6848.

@nimrod-becker I would rename this issue title to represent this is about cosbench +hashCheck = manual restart, and I would say to keep it open because we don't know why we had to restart the endpoints.

@romayalon I would look for the semaphore messages (like this semaphore warning stuck and the buffer pool warning stuck) in the endpoints logs and see if there's anything reported.

akmithal commented 2 years ago

Hi @MonicaLemay , Are you blocked on your story for this fix ? Or can you still work upon on your story with md5 check off.

MonicaLemay commented 2 years ago

@romayalon I am able to make progress on my user story. I can use warp. I reported this as a blocker b/c this defect is possibly a blocker to the MVP release I think that part of the message got lost in the hand-off of status. If you do not agree that this is a blocker to MVP please let me know. I made that assessment b/c of the above statement:

The fact that we observed this no-service issue only with (potentially buggy) cosbench with hashCheck=true, means there is a pathological case in the noobaa code. Perhaps we still have an issue with promises and the semaphore as we had in https://github.com/noobaa/noobaa-core/issues/6848.

Please let me know what you think.

romayalon commented 2 years ago

@MonicaLemay I didn't say it's a blocker / not a blocker for the MVP. What I said is that if there are more stories to test that do not involve hashCheck=true, you can continue with these, right? And I agree that there might be a thing here in NooBaa but I do think that we need to do the build changes to check if it still occours. Anyway, I'll try to find more info on the logs we currently have.

MonicaLemay commented 2 years ago

@romayalon Thanks for the clarification. I have 2 more stories to test that do not involve hashCheck=true. I can continue.

MonicaLemay commented 2 years ago

@romayalon The test results for removing hashCheck=true from the read stage does NOT cause the outage (can't cp file). Here is the xml file.

[root@c83f1-dan4 RW_workloads]# cat /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5001_GB_random_size_num_of_objects.xml
<?xml version="1.0" encoding="UTF-8" ?>
<workload name="GBrandomsizenumobj" description="GBrandom read and write workload">
  <storage type="s3" config="accesskey=AMKwXLzOJEXzayywWPnO;secretkey=oUJLgVjRH7D0bvzHtoQ6zv8Bec6PQqLusVDlpz9D;path_style_access=true;endpoint=http://metallb:80"/>

  <workflow>

   <workstage name="prepare_rGB">
      <work name="prepare_rGB" workers="256" interval="10" rampup="0" type="prepare" ratio="100" config="cprefix=s5001b;oprefix=s5001o;osuffix=_rGB;containers=r(1,2);objects=r(1,3);sizes=r(1,2)GB;hashCheck=true"/>
    </workstage>

    <workstage name="rGB_write_read_delete">
      <work name="rGB_write_read_delete" workers="256" interval="10" rampup="0" runtime="300">
        <operation type="write" ratio="20" config="cprefix=s5001b;oprefix=s5001o;osuffix=_rGB;containers=r(5,6);objects=r(1,20);sizes=r(1,2)GB;hashCheck=true"/>
        <operation type="read" ratio="80" config="cprefix=s5001b;oprefix=s5001o;osuffix=_rGB;containers=r(1,2);objects=r(1,3)"/>
      </work>
    </workstage>

  </workflow>
</workload>
rkomandu commented 2 years ago

@romayalon are you planning to have this in the 4.9.6 code ?

romayalon commented 2 years ago

@rkomandu first I'll build these changes on 4.9 branch so we can see that it solves Monica's issue and then we can discuss if it'll get into 4.9.6.

romayalon commented 2 years ago

@rkomandu @MonicaLemay The fix can be found in noobaa/noobaa-core:moki-20220509, @MonicaLemay can you please try it?

romayalon commented 2 years ago

@MonicaLemay an updated image named noobaa/noobaa-core:moki-20220510 is now available.

romayalon commented 2 years ago

@MonicaLemay The most updated image named noobaa/noobaa-core:moki-20220511 is now available.

MonicaLemay commented 2 years ago

I tested the patch today and it works. I no longer see the issue. Thank you.

romayalon commented 2 years ago

@MonicaLemay is there something else we should do here or we can close?

MonicaLemay commented 2 years ago

As far as I am concerned it can be closed. I don't know the Noobaa team's procedures. Some teams leave defects open until a fix or patch is in an official build. If this team does not do this, then it can be closed.