Closed whowutwut closed 6 years ago
Can we log on these 2 hosts to collect some information about the hard disks?
Yes, boston32 is the first one and boston30 is the 2nd one. Under stratton
Hi @whowutwut , For boston32, if no OS has been installed on any disk, we will choose the one with the smallest WWN value. (All disks' driver are the same.)
sda
0x5000c500947f6dd7
/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:0/0:2:0:0/block/sda
sdb
0x5000c500947f7323
/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:1/0:2:1:0/block/sdb
sdc
0x5000c500947f6dfb
/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:2/0:2:2:0/block/sdc
sdd
0x5000c500947f787b
/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:3/0:2:3:0/block/sdd
sde --------------------*
0x5000c500947f562b
/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:4/0:2:4:0/block/sde
sdf
0x5000c500947f7a17
/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:5/0:2:5:0/block/sdf
sdg
0x5000c500947f5b5b
/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:6/0:2:6:0/block/sdg
sdh
0x5000c500947f5717
/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:7/0:2:7:0/block/sdh
sdi
0x5000c500947f79cb
/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:8/0:2:8:0/block/sdi
sdj
0x5000c500947f6edb
/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:9/0:2:9:0/block/sdj
From the info above, we will choose sde
as the install disk.
But for boston30, just 2 disks on it. Is this the 2rd one?
[root@boston30 ~]# ls /dev/sd
sda sda1 sda2 sda3 sda4 sda5 sdb
@xuweibj I did not investigate this much yesterday, too quick to write up the i ssue..., but I think i counted the drives wrong, i was counting top -> botton, left -> right, but I think it goes bottom -> top, left -> right.
So the 5th drive is chosen from the 1st server , and the 1st drive is chosen from the 2nd server The question here is why isn't the 1st drive chosen from the 1st server?
We should have 10 drives in the 1st server (but I have to check physically..)
Looking at this documentation under STAT drives, https://www.ibm.com/support/knowledgecenter/POWER9/p9eip/p9eip22p_drive_install_details.htm could this be the reason, that the mini-SAS drive connection we chose is the B one.. first drive in the 2nd connector?
@whowutwut
The logic in getinstdisk is that will choose the disk with the smallest WWN.
For boston30, sda is the one with smallest WWN, so choose sda.
sda
0x5000c500947f6e47
/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:0/0:2:0:0/block/sda
sdb
0x5000c500947f736f
/devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:1/0:2:1:0/block/sdb
For boston32, sde is the smallest, so choose it and sde is the 5th.
So, let's close this issue?
@lychen214 Eric, do you have any idea about this behavior above? When selecting a physical disk on a Boston server, we get inconsistent results using the WWN value...
@whowutwut Per comment below, the tool will pick up the smallest WWN of disk. And it looks like it worked as expected on boston30 and boston32. But I have no idea how the WWN value was determined from OS perspective. Did you see the tool was using the non-smallest WWN to do the installation? Please help clarify it. Thanks.
The logic in getinstdisk is that will choose the disk with the smallest WWN.
For boston30, sda is the one with smallest WWN, so choose sda.
sda 0x5000c500947f6e47 /devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:0/0:2:0:0/block/sda sdb 0x5000c500947f736f /devices/pci0003:00/0003:00:00.0/0003:01:00.0/host0/target0:2:1/0:2:1:0/block/sdb
For boston32, sde is the smallest, so choose it and sde is the 5th.
If we are selecting correctly based on the smallest WWN, then one boston node, the WWN number resulted in the 0 disk of the mini-SAS connector 1... on another boston node, the smallest WWN number resulted in the 0 disk of the mini-SAS connector 2.
That's what I don't understand, why one boston node installed onto the bottom left drive (green circle), while another installed onto the 2nd drive in the second column of the server (red circle)....
Tag @xuweibj too
@xuweibj above you said:
For boston32, if no OS has been installed on any disk, we will choose the one with the smallest WWN value. (All disks' driver are the same.)
What happens if an OS has already been installed on any of the disks?
We are seeing a similar issue with the Habanero boxes .. where there are 12 drives.. and an OS was previously installed on sda
, unknown by what provisioninig tool. but xCAT installs the OS onto sdb
; which is causing problems because now there are 2 bootable OS's and it's booting the wrong one.
@whowutwut If only one disk has been installed OS on it, will choose it. If more than one, choose the one with the smallest WWN value.
@xuweibj Even if switching RHEL & Ubuntu?
Yes, whatever the current OS is, just check whether has OS installed on.
@whowutwut Would it still start from the 0 disk of the mini-SAS connector 2 if you use the clean hard drives?
A possible reason that not choose the Disk that have OS installed is, the FS is not support to mount in initrd. In getinstdisk, we try to mount the partition, then, check whether this is vmlinu* in this partition, if there is, the harddisk which partition is located will be treat as the install disk.
Any technical supporting for choosing the smallest WWN
value for 1st disk? I check many Boston/Briggs sever in our environment, not the sda
is the smallest WWN.
For example
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdd 8:48 0 1.8T 0 disk
sdb 8:16 0 1.8T 0 disk
sde 8:64 0 118G 0 disk
|-sde2 8:66 0 512M 0 part /boot
|-sde5 8:69 0 113.5G 0 part
| `-system-root 253:0 0 113.5G 0 lvm /
|-sde3 8:67 0 4G 0 part [SWAP]
|-sde1 8:65 0 8M 0 part
`-sde4 8:68 0 1K 0 part
sdc 8:32 0 1.8T 0 disk
sda 8:0 0 1.8T 0 disk
`-sda1 8:1 0 1.8T 0 part /data
@cxhong and I are loading software on Supermicro Big Data servers (in this case p9 Boston). Doing a generic
rinstall
of the node with a RHEL 7.5 GA image seems to pick different disks to install on.I noticed this when standing in front of the server....
The first server picks the 5th disk ... because I see the active light flashing ...
boston32 (more than 2... maybe 6 physical disks)
The second server picks the 1st disk... because I see the active light flashing ....
boston30 (2 physical disks)
@xuweibj Do you know what is going on here?