parsa-epfl / cloudsuite

A Benchmark Suite for Cloud Services.
http://cloudsuite.ch
Other
210 stars 121 forks source link

Data Analytics implementation questions #427

Closed minkyuSnow closed 1 year ago

minkyuSnow commented 1 year ago

Hello

The Data Analytics benchmark application has changed this time, so I thought I'd give it a shot. I ran it once on a single node and tried it on a multi-node.

However, I ask because when I ran it on a single node, it worked fine, but not on a multi-node.

I ran it on an Arm64 environment.

  1. Single Node

$ docker run -d --net single-net --volumes-from wikimedia-dataset --name data-master cloudsuite/data-analytics --master $ docker run -d --net single-net --name data-slave01 cloudsuite/data-analytics --slave --master-ip=data-master container IP $ docker exec data-master benchmark

  1. Multi Node

Execution errors 스크린샷 2023-04-08 11 14 23 스크린샷 2023-04-08 11 14 40 스크린샷 2023-04-08 11 14 52 스크린샷 2023-04-08 11 15 04 스크린샷 2023-04-08 11 15 18

  1. Using the host network I tried changing the network setting to host and waited about 24 hours as shown in the photo below, but it didn't work. 스크린샷 2023-04-08 11 27 37

To summarize, before the change, data analytics was running with the overlay network, but now it is not running with an error message, and if I change it to host and test it, it stops running.

xusine commented 1 year ago

Hello,

It seems that the worker node is not detected by the master node. By default, the master node is listening on the IP address of your hostname. In this case, you can try to change the worker's parameter `--master-ip to the IP address of the hostname and see whether it works.

minkyuSnow commented 1 year ago

Hello,

It seems that the worker node is not detected by the master node. By default, the master node is listening on the IP address of your hostname. In this case, you can try to change the worker's parameter `--master-ip to the IP address of the hostname and see whether it works.

Thank you for your response.

I'm sorry, but I don't understand.

worker node $ docker docker run -d --net multi-net --name data-slave01 cloudsuite/data-analytics --slave --master-ip=x.x.x.x I'm not sure how you're telling me to write the --master-ip=x.x.x.x part.

I checked the master container IP with the overlay mutinet network inspect command and entered it in the x.x.x.x part. I also tried entering the actual IP of the master node.

I'm trying several things to fix this, including removing the --master-ip part and just entering the IP. But I don't quite understand, so I'm asking.

master container ip -> 10.0.1.2 master node ip -> 172.30.1.40 master hostname -> pi1

worker node ip -> 172.30.1.41 worker hostname -> pi2

xusine commented 1 year ago

Hello,

Sorry if I cause any confusion. My hypothesis to your problem is that the master node may not listen on a correct NIC. By default, the master node is using hostname as its address, and in your case it is pi1. You can try to ping pi1 from the worker node to see if they have connection.

If not, you can explicitly let the master node listen on an IP address by passing --master-ip=<x.x.x.x> to the master node. To be more specific, if you use container's network, it would be --master-ip=10.0.1.2. If you are using host network, it would be --master-ip=172.30.1.40

minkyuSnow commented 1 year ago

Thank you for your response.

We performed a ping test between PI1 and PI2 and found that the ping is connected.

The container ping test was also connected to the inside of the container, and the package update was carried out, and the ping tool was installed and tested, and it was confirmed that the master container and the worker container were connected.

  1. Overlay Network Benchmark There is an error in the overlay network as shown below.

스크린샷 2023-04-09 15 44 31 스크린샷 2023-04-09 15 45 11 스크린샷 2023-04-09 15 48 24 스크린샷 2023-04-09 15 48 35 스크린샷 2023-04-09 15 48 44 스크린샷 2023-04-09 15 49 10 스크린샷 2023-04-09 15 53 00

  1. Host Network Benchmark The host network seemed to go differently than the overlay network, but it stopped running, waited a day, but it didn't start running again start running.

스크린샷 2023-04-09 15 55 54 스크린샷 2023-04-09 15 55 59 스크린샷 2023-04-09 15 58 11

Prior to the documentation and version changes, we continued with the overlay network and ran normally.

xusine commented 1 year ago

Hello,

We performed a ping test between PI1 and PI2 and found that the ping is connected.

May I know how you ping pi1 and pi2? My intention is to check whether both pi1 and pi2 could resolve the name pi1 to its correct address, because on some distribution like Ubuntu, a machine would resolve its hostname as 127.0.0.1, which could cause the server start locally, instead of being broadcast.

Moreover, you are always encoraged to use host network when possible. I see you didn't have forward progress when using host network, this might be caused that your worker node does not have enough disk space: Hadoop requires the worker node to have less than 90% disk usages. You can use df to check the disk usage.

Best,

minkyuSnow commented 1 year ago

Hello,

We performed a ping test between PI1 and PI2 and found that the ping is connected.

May I know how you ping pi1 and pi2? My intention is to check whether both pi1 and pi2 could resolve the name pi1 to its correct address, because on some distribution like Ubuntu, a machine would resolve its hostname as 127.0.0.1, which could cause the server start locally, instead of being broadcast.

Moreover, you are always encoraged to use host network when possible. I see you didn't have forward progress when using host network, this might be caused that your worker node does not have enough disk space: Hadoop requires the worker node to have less than 90% disk usages. You can use df to check the disk usage.

Best,

Thank yuu for your response.

For the ping test, I went into the container and typed "apt-get update && apt-get install -y iputils-ping" and tested pinging each node IP (172.30.1.x).

You mean I need to modify my hosts file? Do you mean to modify the host file of pi1 as follows? 127.0.0.1 pi1

And host file of pi2 too 127.0.0.1 pi2

pi1 -> cat /etc/hosts 스크린샷 2023-04-11 13 18 31

pi2 -> cat /etc/hosts 스크린샷 2023-04-11 13 18 16

And There is plenty of space on the disk. pi1 Disk 스크린샷 2023-04-11 13 22 51

pi2 Disk 스크린샷 2023-04-11 13 22 59

xusine commented 1 year ago

Hello,

Thanks for your reply.

Yes, my intention is to let you check which IP is mapped to the hostname. By default, data analytics server listens on the IP mapped to the hostname, which in your case is 127.0.0.1. This means the server only accept clients from the local machine.

The way to fix this problem is to force the Hadoop server to listen on an active IP address, which, in your case, is 172.30.1.40 (using host network). You can pass --master-ip=172.30.1.40 to the server to achieve this goal, then you can retry to see if there is any problem.

Best,

minkyuSnow commented 1 year ago

Hello,

Thanks for your reply.

Yes, my intention is to let you check which IP is mapped to the hostname. By default, data analytics server listens on the IP mapped to the hostname, which in your case is 127.0.0.1. This means the server only accept clients from the local machine.

The way to fix this problem is to force the Hadoop server to listen on an active IP address, which, in your case, is 172.30.1.40 (using host network). You can pass --master-ip=172.30.1.40 to the server to achieve this goal, then you can retry to see if there is any problem.

Best,

Thank you for your reply.

You mentioned that the IP mapping of hadoop is not working.

So I searched for Hadoop IP mapping and it seemed like the problem was that I needed to store the IPs and hostnames of the nodes I wanted to cluster in the /etc/hosts file, so I did.

pi1(node 1) 스크린샷 2023-04-14 12 49 13

pi2(node 2) 스크린샷 2023-04-14 12 49 01

The benchmark appears to be working fine.

Master and slave seem to be communicating. After modifying the hosts file, are the master and slave communicating normally? I would like to know if the benchmarks are working properly. And whenever I add a node, I just keep adding the IP and hostname of the node I'm adding to the hosts file and run it?

Continue adding the IPs and hostnames of the master and slave to the hosts file on each node, saving them as you go. The contents of the hosts file should be identical. -> Is it correct?

Set the hosts file for each node For example

pi1(node 1) 127.0.0.1 localhost

172.30.1.40 pi1 172.30.1.41 pi2 172.30.1.42 pi3 -> add node 3

pi2(node 2) 127.0.0.1 localhost

172.30.1.40 pi1 172.30.1.41 pi2 172.30.1.42 pi3 -> add node 3

pi3(node 3) 127.0.0.1 localhost

172.30.1.40 pi1 172.30.1.41 pi2 172.30.1.42 pi3 -> add node 3

xusine commented 1 year ago

Hello,

Glad to see it works.

It is also pretty strange that overriding the master IP address does not work. Currently, I think you have to manually modify the hosts file to make the master aware of the worker nodes. We would check if we can fix this bug by forcing Hadoop use IP address for communication.

Best,

minkyuSnow commented 1 year ago

Hello,

Glad to see it works.

It is also pretty strange that overriding the master IP address does not work. Currently, I think you have to manually modify the hosts file to make the master aware of the worker nodes. We would check if we can fix this bug by forcing Hadoop use IP address for communication.

Best,

It will run as normal. Thank you very much for your help.