intel-cloud / cosbench

a benchmark tool for cloud object storage service
Other
573 stars 242 forks source link

how this happen "mission: null, driver: driver1" the driver1 is alive #349

Closed JYang1986 closed 7 years ago

JYang1986 commented 7 years ago

Problem Description

210 drivers is all alive

test xml: `<?xml version="1.0" encoding="UTF-8" ?>

` Op-Type Op-Count Byte-Count Avg-ResTime Avg-ProcTime Throughput Bandwidth Succ-Ratio op1: init -write 0 ops 0 B N/A N/A 0 op/s 0 B/S N/A op1: prepare -write 1 kops 256 MB 1345.61 ms 1338.86 ms 37.15 op/s 9.51 MB/S 100% **50workers prepare is completed, 150workers read failed. I test 100 workers read success 100%** ID Name Works Workers Op-Info State Link w294-s1-init init 1 wks 2 wkrs init completed view details w294-s2-prepare prepare 1 wks 50 wkrs prepare completed view details w294-s3-get 256KB data with 200 workers get 256KB data with 200 workers 1 wks 150 wkrs read terminated view details w294-s4-cleanup cleanup 1 wks 20 wkrs cleanup aborted view details w294-s5-dispose dispose 1 wks 2 wkrs dispose aborted view details **one drivers terminated others are all drivers aborted** Driver Mission Work Worker-Info Op-Info State Link driver128 N/A Put256KBData1 128 - 128 read (100%) aborted N/A driver129 N/A Put256KBData1 129 - 129 read (100%) aborted N/A driver130 N/A Put256KBData1 130 - 130 read (100%) aborted N/A driver131 N/A Put256KBData1 131 - 131 read (100%) terminated N/A driver132 N/A Put256KBData1 132 - 132 read (100%) aborted N/A **log info:** [root@node69 0.4.2.c4]# vim archive/w294-get-200Workers-256KB/workload.log ......... 2017-03-17 11:10:35,356 [INFO] [NoneStorage] - performing PUT at /s3testqwer2/myobjects500 ================================================== stage: s3-get 256KB data with 200 workers ================================================== ---------------------------------- **mission: null, driver: driver1** ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver2 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver3 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver4 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver5 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver6 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver7 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver8 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver9 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver10 ---------------------------------- ........ [N/A]---------------------------------- mission: null, driver: driver143 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver144 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver145 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver146 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver147 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver148 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver149 ---------------------------------- [N/A]---------------------------------- mission: null, driver: driver150 ---------------------------------- ## my COSBench conf 3 node, Each node 70 drivers. Because of [https://github.com/intel-cloud/cosbench/issues/348](url), driver2 port = driver2 port + 5, start from 16088 **controller:** [controller] drivers = 210 log_level = INFO log_file = log/system.log archive_dir = archive [driver1] name = driver1 url = http://192.168.223.68:16088/driver [driver2] name = driver2 url = http://192.168.223.68:16093/driver [driver3] name = driver3 url = http://192.168.223.68:16098/driver [driver4] name = driver4 url = http://192.168.223.68:16103/driver .... [driver69] name = driver69 url = http://192.168.223.68:16428/driver [driver70] name = driver70 url = http://192.168.223.68:16433/driver [driver71] name = driver71 url = http://192.168.223.69:16088/driver [driver72] name = driver72 url = http://192.168.223.69:16093/driver [driver73] name = driver73 url = http://192.168.223.69:16098/driver [driver74] ..... [driver207] name = driver207 url = http://192.168.223.73:16418/driver [driver208] name = driver208 url = http://192.168.223.73:16423/driver [driver209] name = driver209 url = http://192.168.223.73:16428/driver [driver210] name = driver210 url = http://192.168.223.73:16433/driver **driver:** [driver] name=127.0.0.1:18088 url=http://127.0.0.1:18088/driver ## my Environment [root@node0 ~]# cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) [root@node76 ~]# uname -a Linux node76 3.10.0-514.6.1.el7.x86_64 #1 SMP Wed Jan 18 13:06:36 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [root@node0 ~]# web server start by Minio Minio 03.16.2017 master compile the version cosbench 0.4.2.c4
Wilhelmshaven commented 7 years ago

Oh, I think you may mess up the concept of driver and worker... There is no relationship between the two. In my opinion, driver means client, and worker means worker thread, One server one driver is enough. Worker means concurrency, and one driver can handle many workers. We usually write workload like this:

    <workstage name="put-special">
       <work name="driver1" workers="500" totalOps="2000000" driver="driver1">
         <storage type="s3" config="accesskey=cosbench;secretkey=cosbench;endpoint=http://127.0.0.1:8080/" />
         <operation type="write" ratio="100" config="cprefix=cosbench;oprefix=4M_R11_;objects=s(1,2000000);containers=c(1);sizes=c(4)KB" />
       </work>
       <work name="driver2" workers="500" totalOps="2000000" driver="driver2">
         <storage type="s3" config="accesskey=cosbench;secretkey=cosbench;endpoint=http://127.0.0.2:8080/" />
         <operation type="write" ratio="100" config="cprefix=cosbench;oprefix=4M_R21_;objects=s(1,2000000);containers=c(1);sizes=c(4)KB" />
       </work>
       <work name="driver3" workers="500" totalOps="2000000" driver="driver3">
         <storage type="s3" config="accesskey=cosbench;secretkey=cosbench;endpoint=http://127.0.0.3:8080/" />
         <operation type="write" ratio="100" config="cprefix=cosbench;oprefix=4M_R31_;objects=s(1,2000000);containers=c(1);sizes=c(4)KB" />
       </work>
       <work name="driver4" workers="500" totalOps="2000000" driver="driver4">
         <storage type="s3" config="accesskey=cosbench;secretkey=cosbench;endpoint=http://127.0.0.4:8080/" />
         <operation type="write" ratio="100" config="cprefix=cosbench;oprefix=4M_R41_;objects=s(1,2000000);containers=c(1);sizes=c(4)KB" />
       </work>
    </workstage>

I use 4 drivers just because I use 4 physical machines as COSBench driver. Also, your workload didn't specify any driver... BTW, containers=u(1,2);objects=s(1,500) is not a good idea, 500 or 1000 objects in total ? It's hard to understand... COSBench seems to be unsupported now(don't understand why java...), the conf-example may mislead you.

JYang1986 commented 7 years ago

Thank you。I'm a chinese and a C progammer and I have no time to research the COSBench code. So I could only read the operating manual, but the manual is not detailed, there are problems, like chunk flag is useless, I have read the code of chunk flag, the code have chunk flag, but the COSBench set doesn't work, this wasted my whole day, and finally we decided not use COSBench to test the file which greater than 5G 。And the conf-examples is not very good, so I used to be very confused. Like write success 100%, but read success is very low which use the same conf. I don't understand how this happen. I know the dirver is the client, in fact, each driver will start a process, bind two ports, one for web, another for ohter operate, but the problem of read success is very low and in my company no one used COSBench, so I comment this issues. Thank you very much for your answer. But I have not enough time, I had to try other methods, and finally found start more drivers, the read success is 100%, I have to only temporarily use this method. I have been in the customer's company to do POC. When I come back my company, I will try your suggest.

Wilhelmshaven commented 7 years ago

Yep, If COSBench is written in C/C++, I would try support it... But now, we still regard COSBench as first choice in frontend test when testing ceph. Based on my teammmates effort, we can handle it. CHN too.

JYang1986 commented 7 years ago

@Wilhelmshaven do you have qq or weixin, i have send a message to your email nntt19@163.com。my email is dahaimydream@126.com,my qq is 69725462. waiting for your answer