parsa-epfl / cloudsuite

A Benchmark Suite for Cloud Services.
http://cloudsuite.ch
Other
212 stars 121 forks source link

Can I run cloudsuite benchmark on arm64 architecture directly by offical code? #433

Open qiyuxinlin opened 1 year ago

qiyuxinlin commented 1 year ago

I have tried all the benchmarks of cloudsuite for arm64. There are a lot of problems. almost all the benchmark can not run by offical code. Web-serving have php-fpm problem. And I am confused because I can not reproduce effort by offical code. by query.sh, I got 404 Not Fund result. So I am looking for your help.

xusine commented 1 year ago

Hello,

It would be better if you could provide the error message you met so that we can look into your errors together.

Best

qiyuxinlin commented 1 year ago

When I download media stream dataset. I got message like that: [mp4 @ 0xaaae0ac4af20] Application provided duration: -9223372036854775808 / timestamp: -9223372036854775808 is out of range for mov/mp4 format

[mp4 @ 0xaaabfef89790] Application provided duration: -9223372036854775808 / timestamp: -9223372036854775808 is out of range for mov/mp4 format

qiyuxinlin commented 1 year ago

When I try to use query.sh from web search as the documentation says, I get the following:

404 <p> Searching for Solr?<br/> You must type the correct path.<br/> Solr will respond. </p>

And I did not find error message in nginx log

qiyuxinlin commented 1 year ago

When I try to start the server docker in web server benchmark and access http://<web_server's IP>:8080, I may get 502 Bad Gateway, and I found that the possible error lies in php-fpm. The keyword is SIGSEGV. The error is similar to the following: [10-Dec-2015 10:47:33] WARNING: [pool www] child 5022 exited on signal 11 (SIGSEGV) after 618.942788 seconds from start

qiyuxinlin commented 1 year ago

The above three situations are the main problems I currently encounter. When I try to run the same docker on a server with x86_64 architecture, there is no such problem. Thank you for your help

xusine commented 1 year ago

Hello,

Thanks for your error message. May I know the configuration of your ARM64 server? We never met this problem before so I am curious what happened.

Do other workload report errors?

Best,

qiyuxinlin commented 1 year ago

Hello, Thanks for your paiency. Our server series is TaiShan 200 (Model 2280). Our configuration of ARM64 server is as follow: Linux architecture: Aarch64 system: openEuler 20.03 CPU: 2*64 Cores kunpeng920@2.6GHz memory: 8*32G DDR4 2933MHz hard disk: 6*4000GB-SATA 6Gb/s-7.2K rpm

Almost every workload reports errors. Some will report an error that the JAVA_HOME cannot be found though there is a JAVA environment in the docker. Other errors, such as solr requires us to set SOLR_JAVA_STACK_SIZE greater than 448K.

UlisesLuzius commented 1 year ago

Hello,

Could you please send us the exact list of commands you used for launching web-search? And the error output?

Rafael

qiyuxinlin commented 1 year ago

Hello, I launch web-search by these commands: server: docker run --name web_search_dataset --privileged=true cloudsuite/web-search:dataset docker run -it --name server --privileged=true --volumes-from web_search_dataset --net host cloudsuite/web-search:server 14g 1 Then I need to enter the docker to add a configuration docker exec -it --user=root server /bin/sh apt-get update apt-get install vim vim /usr/src/solr-9.1.1/bin/solr.in.sh add a command SOLR_JAVA_STACK_SIZE="-Xss512k" then by docker logs server, I can get

Java 17 detected. Enabled workaround for SOLR-16463
OpenJDK 64-Bit Server VM warning: Failed to reserve and commit memory. req_addr: 0x0000fffbdd200000 bytes: 251658240 page size: 2097152 (errno = 12).
OpenJDK 64-Bit Server VM warning: Failed to reserve and commit memory. req_addr: 0x0000fffbd6e00000 bytes: 29360128 page size: 2097152 (errno = 12).
OpenJDK 64-Bit Server VM warning: Failed to reserve and commit memory. req_addr: 0x0000fffbd5000000 bytes: 29360128 page size: 2097152 (errno = 12).
OpenJDK 64-Bit Server VM warning: Failed to reserve and commit memory. req_addr: 0x0000fffbd3400000 bytes: 29360128 page size: 2097152 (errno = 12).
OpenJDK 64-Bit Server VM warning: Failed to reserve and commit memory. req_addr: 0x0000fffbc5400000 bytes: 234881024 page size: 2097152 (errno = 12).
OpenJDK 64-Bit Server VM warning: Failed to reserve and commit memory. req_addr: 0x0000fffbb7400000 bytes: 234881024 page size: 2097152 (errno = 12).
CompileCommand: exclude com/github/benmanes/caffeine/cache/BoundedLocalCache.put bool exclude = true
WARNING: A command line option has enabled the Security Manager
WARNING: The Security Manager is deprecated and will be removed in a future release
2023-05-22 06:14:40.347 INFO  (main) [] o.e.j.u.log Logging initialized @1680ms to org.eclipse.jetty.util.log.Slf4jLog
2023-05-22 06:14:40.657 INFO  (main) [] o.e.j.s.Server jetty-9.4.48.v20220622; built: 2022-06-21T20:42:25.880Z; git: 6b67c5719d1f4371b33655ff2d047d24e171e49a; jvm 17.0.5+8-Ubuntu-2ubuntu122.04
2023-05-22 06:14:41.005 INFO  (main) [] o.a.s.s.CoreContainerProvider Using logger factory org.apache.logging.slf4j.Log4jLoggerFactory
2023-05-22 06:14:41.020 INFO  (main) [] o.a.s.s.CoreContainerProvider  ___      _       Welcome to Apache Solr? version 9.1.1
2023-05-22 06:14:41.020 INFO  (main) [] o.a.s.s.CoreContainerProvider / __| ___| |_ _   Starting in cloud mode on port 8983
2023-05-22 06:14:41.020 INFO  (main) [] o.a.s.s.CoreContainerProvider \__ \/ _ \ | '_|  Install dir: /usr/src/solr-9.1.1
2023-05-22 06:14:41.020 INFO  (main) [] o.a.s.s.CoreContainerProvider |___/\___/_|_|    Start time: 2023-05-22T06:14:41.020948478Z
2023-05-22 06:14:41.049 INFO  (main) [] o.a.s.s.CoreContainerProvider Solr Home: /usr/src/solr_cores (source: system property: solr.solr.home)
2023-05-22 06:14:41.054 INFO  (main) [] o.a.s.c.SolrXmlConfig Loading solr.xml from /usr/src/solr_cores/solr.xml
2023-05-22 06:14:41.174 INFO  (main) [] o.a.s.c.SolrResourceLoader Added 1 libs to classloader, from paths: [/usr/src/solr-9.1.1/lib]
2023-05-22 06:14:42.430 WARN  (main) [] o.a.s.u.StartupLoggingUtils Jetty request logging enabled. Will retain logs for last 3 days. See chapter "Configuring Logging" in reference guide for how to configure.
2023-05-22 06:14:42.438 INFO  (main) [] o.a.s.c.SolrZkServerProps Reading configuration from: /usr/src/solr_cores/zoo.cfg
2023-05-22 06:14:42.442 INFO  (main) [] o.a.s.c.SolrZkServer STARTING EMBEDDED STANDALONE ZOOKEEPER SERVER at port 9983
2023-05-22 06:14:42.443 WARN  (main) [] o.a.s.c.SolrZkServer Embedded Zookeeper is not recommended in production environments. See Reference Guide for details.
2023-05-22 06:14:42.519 WARN  (embeddedZkServer) [] o.a.z.s.ServerCnxnFactory maxCnxns is not configured, using default value 0.
2023-05-22 06:14:42.944 INFO  (main) [] o.a.s.c.ZkContainer Zookeeper client=localhost:9983
2023-05-22 06:14:42.968 INFO  (main) [] o.a.s.c.DistributedClusterStateUpdater Creating DistributedClusterStateUpdater with useDistributedStateUpdate=false. Solr will be using Overseer based cluster state updates.
2023-05-22 06:14:43.006 INFO  (main) [] o.a.s.c.c.ConnectionManager Waiting up to 30000ms for client to connect to ZooKeeper
2023-05-22 06:14:43.010 WARN  (main-SendThread(localhost:9983)) [] o.a.z.ClientCnxn Session 0x0 for server localhost/[0:0:0:0:0:0:0:1]:9983, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. => java.net.ConnectException: Connection refused
    at java.base/sun.nio.ch.Net.pollConnect(Native Method)
java.net.ConnectException: Connection refused
    at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
    at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344) ~[?:?]
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282) ~[?:?]
2023-05-22 06:14:44.114 WARN  (main-SendThread(localhost:9983)) [] o.a.z.ClientCnxn Session 0x0 for server localhost/[0:0:0:0:0:0:0:1]:9983, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. => java.net.ConnectException: Connection refused
    at java.base/sun.nio.ch.Net.pollConnect(Native Method)
java.net.ConnectException: Connection refused
    at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
    at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344) ~[?:?]
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282) ~[?:?]
2023-05-22 06:14:45.217 WARN  (main-SendThread(localhost:9983)) [] o.a.z.ClientCnxn Session 0x0 for server localhost/[0:0:0:0:0:0:0:1]:9983, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. => java.net.ConnectException: Connection refused
    at java.base/sun.nio.ch.Net.pollConnect(Native Method)
java.net.ConnectException: Connection refused
    at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
    at sun.nio.ch.Net.pollConnectNow(Net.java:672) ~[?:?]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946) ~[?:?]
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344) ~[?:?]
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282) ~[?:?]
2023-05-22 06:14:46.353 INFO  (zkConnectionManagerCallback-10-thread-1) [] o.a.s.c.c.ConnectionManager zkClient has connected
2023-05-22 06:14:46.353 INFO  (main) [] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
2023-05-22 06:14:46.566 WARN  (main) [] o.a.s.c.ZkController Contents of zookeeper /security.json are world-readable; consider setting up ACLs as described in https://solr.apache.org/guide/solr/latest/deployment-guide/zookeeper-access-control.html
2023-05-22 06:14:46.614 INFO  (main) [] o.a.s.c.DistributedClusterStateUpdater Creating DistributedClusterStateUpdater with useDistributedStateUpdate=false. Solr will be using Overseer based cluster state updates.
2023-05-22 06:14:46.642 INFO  (main) [] o.a.s.c.OverseerElectionContext I am going to be the leader 10.90.1.39:8983_solr
2023-05-22 06:14:46.650 INFO  (main) [] o.a.s.c.Overseer Overseer (id=72095755860246528-10.90.1.39:8983_solr-n_0000000000) starting
2023-05-22 06:14:46.761 INFO  (main) [] o.a.s.c.ZkController Register node as live in ZooKeeper:/live_nodes/10.90.1.39:8983_solr
2023-05-22 06:14:46.762 INFO  (OverseerStateUpdate-72095755860246528-10.90.1.39:8983_solr-n_0000000000) [] o.a.s.c.Overseer Starting to work on the main queue : 10.90.1.39:8983_solr
2023-05-22 06:14:46.777 INFO  (zkCallback-9-thread-1) [] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (0) -> (1)
2023-05-22 06:14:46.826 WARN  (main) [] o.a.s.c.CoreContainer Not all security plugins configured!  authentication=disabled authorization=disabled.  Solr is only as secure as you make it. Consider configuring authentication/authorization before exposing Solr to users internal or external.  See https://s.apache.org/solrsecurity for more info
2023-05-22 06:14:47.271 INFO  (main) [] o.a.s.c.CorePropertiesLocator Found 0 core definitions underneath /usr/src/solr_cores
2023-05-22 06:14:47.428 INFO  (main) [] o.e.j.s.h.ContextHandler Started o.e.j.w.WebAppContext@14a54ef6{/solr,file:///usr/src/solr-9.1.1/server/solr-webapp/webapp/,AVAILABLE}{/usr/src/solr-9.1.1/server/solr-webapp/webapp}
2023-05-22 06:14:47.435 INFO  (main) [] o.e.j.s.RequestLogWriter Opened /usr/src/solr-9.1.1/server/logs/2023_05_22.request.log
2023-05-22 06:14:47.478 INFO  (main) [] o.e.j.s.AbstractConnector Started ServerConnector@37cd92d6{HTTP/1.1, (http/1.1, h2c)}{0.0.0.0:8983}
2023-05-22 06:14:47.479 INFO  (main) [] o.e.j.s.Server Started @8815ms

client: docker run -it --name web_search_client --privileged=true --net host cloudsuite/web-search:client server_IP 5 After that I get the following message:

May 24, 2023 12:59:46 PM sample.searchdriver.SearchDriver doGet
SEVERE: ERROR!
’‘’
<benchResults>
    <benchSummary name="Sample Search Workload" version="0.4">
        <runId>1</runId>
        <startTime>Wed May 24 12:58:17 GMT 2023</startTime>
        <endTime>Wed May 24 12:59:46 GMT 2023</endTime>
        <metric unit="ops/sec">5.000</metric>
        <passed>false</passed>
    </benchSummary>
    <driverSummary name="SearchDriver">
        <metric unit="ops/sec">5.000</metric>
        <startTime>Wed May 24 12:58:17 GMT 2023</startTime>
        <endTime>Wed May 24 12:59:46 GMT 2023</endTime>
        <totalOps unit="operations">300</totalOps>
        <users>5</users>
        <rtXtps>4.9285</rtXtps>
        <passed>false</passed>
        <mix allowedDeviation="0.0000">
            <operation name="GET">
                <successes>300</successes>
                <failures>0</failures>
                <mix>1.0000</mix>
                <requiredMix>1.0000</requiredMix>
                <passed>true</passed>
            </operation>
        </mix>
        <responseTimes unit="seconds">
            <operation name="GET" r90th="0.500">
                <avg>0.004</avg>
                <max>0.023</max>
                <sd>0.003</sd>
                <p90th>0.005</p90th>
                <passed>true</passed>
                <p99th>0.020</p99th>
            </operation>
        </responseTimes>
        <delayTimes>
            <operation name="GET" type="thinkTime">
                <targetedAvg>1.000</targetedAvg>
                <actualAvg>0.999</actualAvg>
                <min>0.999</min>
                <max>1.000</max>
                <passed>false</passed>
            </operation>
        </delayTimes>
    </driverSummary>
</benchResults>

I don't know if such a prompt counts as the successful connection between my server and client. And sometimes different information will be displayed:

May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread run
SEVERE: SearchDriverAgent[1].783: SearchDriverAgent[1].783.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
com.sun.faban.driver.FatalException: SearchDriverAgent[1].783.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
    at com.sun.faban.driver.engine.AgentThread.validateTimeCompletion(AgentThread.java:532)
    at com.sun.faban.driver.engine.TimeThread.doRun(TimeThread.java:173)
    at com.sun.faban.driver.engine.AgentThread.run(AgentThread.java:202)

May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentImpl kill
WARNING: SearchDriverAgent[1]: Killing benchmark run
May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.MasterImpl$2 run
SEVERE: Run aborted. Master terminating!
xusine commented 1 year ago

Hello,

Thanks for the log. When you see the report (the log showing <benchResults>), it means the workload runs successfully. We never see the error message in your last log. Can you check what happened to the server? Is the server still running, or being killed?

Best,

qiyuxinlin commented 1 year ago

Hello, Thanks again for your help! I got the following information yesterday:

May 23, 2023 3:08:44 PM sample.searchdriver.SearchDriver doGet
SEVERE: ERROR!

May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread run
SEVERE: SearchDriverAgent[1].975: SearchDriverAgent[1].975.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
com.sun.faban.driver.FatalException: SearchDriverAgent[1].975.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
    at com.sun.faban.driver.engine.AgentThread.validateTimeCompletion(AgentThread.java:532)
    at com.sun.faban.driver.engine.TimeThread.doRun(TimeThread.java:173)
    at com.sun.faban.driver.engine.AgentThread.run(AgentThread.java:202)

May 23, 2023 3:08:44 PM sample.searchdriver.SearchDriver doGet
SEVERE: ERROR!

May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread validateTimeCompletion
SEVERE: SearchDriverAgent[1].854.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread validateTimeCompletion
SEVERE: SearchDriverAgent[1].355.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
May 23, 2023 3:08:44 PM sample.searchdriver.SearchDriver doGet
SEVERE: ERROR!

May 23, 2023 3:08:44 PM sample.searchdriver.SearchDriver doGet
SEVERE: ERROR!

May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread validateTimeCompletion
SEVERE: SearchDriverAgent[1].835.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread validateTimeCompletion
SEVERE: SearchDriverAgent[1].537.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread validateTimeCompletion
SEVERE: SearchDriverAgent[1].914.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread run
SEVERE: SearchDriverAgent[1].355: SearchDriverAgent[1].355.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
com.sun.faban.driver.FatalException: SearchDriverAgent[1].355.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
    at com.sun.faban.driver.engine.AgentThread.validateTimeCompletion(AgentThread.java:532)
    at com.sun.faban.driver.engine.TimeThread.doRun(TimeThread.java:173)
    at com.sun.faban.driver.engine.AgentThread.run(AgentThread.java:202)

May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread run
SEVERE: SearchDriverAgent[1].854: SearchDriverAgent[1].854.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
com.sun.faban.driver.FatalException: SearchDriverAgent[1].854.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
    at com.sun.faban.driver.engine.AgentThread.validateTimeCompletion(AgentThread.java:532)
    at com.sun.faban.driver.engine.TimeThread.doRun(TimeThread.java:173)
    at com.sun.faban.driver.engine.AgentThread.run(AgentThread.java:202)

May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread validateTimeCompletion
SEVERE: SearchDriverAgent[1].783.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread run
SEVERE: SearchDriverAgent[1].914: SearchDriverAgent[1].914.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
com.sun.faban.driver.FatalException: SearchDriverAgent[1].914.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
    at com.sun.faban.driver.engine.AgentThread.validateTimeCompletion(AgentThread.java:532)
    at com.sun.faban.driver.engine.TimeThread.doRun(TimeThread.java:173)
    at com.sun.faban.driver.engine.AgentThread.run(AgentThread.java:202)

May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread run
SEVERE: SearchDriverAgent[1].537: SearchDriverAgent[1].537.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
com.sun.faban.driver.FatalException: SearchDriverAgent[1].537.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
    at com.sun.faban.driver.engine.AgentThread.validateTimeCompletion(AgentThread.java:532)
    at com.sun.faban.driver.engine.TimeThread.doRun(TimeThread.java:173)
    at com.sun.faban.driver.engine.AgentThread.run(AgentThread.java:202)

May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread run
SEVERE: SearchDriverAgent[1].835: SearchDriverAgent[1].835.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
com.sun.faban.driver.FatalException: SearchDriverAgent[1].835.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
    at com.sun.faban.driver.engine.AgentThread.validateTimeCompletion(AgentThread.java:532)
    at com.sun.faban.driver.engine.TimeThread.doRun(TimeThread.java:173)
    at com.sun.faban.driver.engine.AgentThread.run(AgentThread.java:202)

May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentThread run
SEVERE: SearchDriverAgent[1].783: SearchDriverAgent[1].783.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
com.sun.faban.driver.FatalException: SearchDriverAgent[1].783.doGet: Transport incomplete! Please ensure transport exception is thrown from operation.
    at com.sun.faban.driver.engine.AgentThread.validateTimeCompletion(AgentThread.java:532)
    at com.sun.faban.driver.engine.TimeThread.doRun(TimeThread.java:173)
    at com.sun.faban.driver.engine.AgentThread.run(AgentThread.java:202)

May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.AgentImpl kill
WARNING: SearchDriverAgent[1]: Killing benchmark run
May 23, 2023 3:08:44 PM com.sun.faban.driver.engine.MasterImpl$2 run
SEVERE: Run aborted. Master terminating!
cat: /usr/src/outputFaban/1/summary.xml: No such file or directory
xusine commented 1 year ago

Hello,

Sorry for our late reply. Did you check the status of the Solr server when seeing these logs? Our guess is that the server is shutdown due to some reasons, thus the client reports such errors.

Best,

qiyuxinlin commented 1 year ago

Hello, Thank you for your patience and guidance. I may indeed have overlooked the question of whether the server starts normally. I'll check later.