renzhengeek / issues

0 stars 0 forks source link

multi nodes : cannot mount success on other remote node! #3

Closed renzhengeek closed 8 years ago

renzhengeek commented 9 years ago

Becuase of the mount problem as the title said, file operations on remote nodes always failed to find certain files or dirs.

2015年 03月 31日 星期二 14:29:16 CST
multi_mmap
2015/03/31,14:29:16  Mkfs device /dev/mapper/cluster--vg2-big--lv:
2015/03/31,14:29:16  /usr/bin/sudo -u root /sbin/mkfs.ocfs2 --fs-features=sparse,unwritten,inline-data --cluster-stack=pcmk --cluster-name=hacluster                                          -b 512 -C 4096 -L multi-multi_mmap-test -N 3   /dev/mapper/cluster--vg2-big--lv 
mkfs.ocfs2 1.8.2
Cluster stack: pcmk
Cluster name: hacluster
Stack Flags: 0x0
NOTE: Feature extended slot map may be enabled
Overwriting existing ocfs2 partition.
Proceed (y/N): Label: multi-multi_mmap-test
Features: sparse extended-slotmap backup-super unwritten inline-data strict-journal-super xattr indexed-dirs refcount discontig-bg
Block size: 512 (9 bits)
Cluster size: 4096 (12 bits)
Volume size: 161061273600 (39321600 clusters) (314572800 blocks)
Cluster groups: 10972 (tail covers 1536 clusters, rest cover 3584 clusters)
Extent allocator size: 54525952 (52 groups)
Journal size: 33554432
Node slots: 3
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 4 block(s)
Formatting Journals: done
Growing extent allocator: done
Formatting slot map: done
Formatting quota files: done
Writing lost+found: done
mkfs.ocfs2 successful

2015/03/31,14:29:25  Mount volume from all nodes
2015/03/31,14:29:25  /opt/ocfs2-test/bin/remote_mount.py -l multi-multi_mmap-test -m /mnt/shared -n nopen-nd1,nopen-nd2,nopen-nd3 
+ /usr/lib64/mpi/gcc/openmpi/bin/mpirun -mca btl tcp,self -mca orte_rsh_agent ssh:rsh -np 3 --host nopen-nd1,nopen-nd2,nopen-nd3 /opt/ocfs2-test/bin/command.py --mount -l multi-multi_mmap-test -m /mnt/shared
015/03/31,14:29:25  Run multi_mmap, CMD: /opt/ocfs2-test/bin/run_multi_mmap.py -i 20000 -I eth0 -n nopen-nd1,nopen-nd2,nopen-nd3 -c -b 6000 --hole -f /mnt/shared/multi_mmap_test/multi_mmap_test_file
+ /usr/lib64/mpi/gcc/openmpi/bin/mpirun -mca btl tcp,self -mca orte_rsh_agent ssh:rsh -mca btl_tcp_if_include eth0 -np 3 --host nopen-nd1,nopen-nd2,nopen-nd3 /opt/ocfs2-test/bin/multi_mmap -c -b 6000 -h -i 20000 /mnt/shared/multi_mmap_test/multi_mmap_test_file
+ tee -a /opt/ocfs2-test/log/o2t.log
nopen-nd1: rank: 0, procs: 3, filename "/mnt/shared/multi_mmap_test/multi_mmap_test_file"
nopen-nd2: rank: 1, procs: 3, filename "/mnt/shared/multi_mmap_test/multi_mmap_test_file"
nopen-nd3: rank: 2, procs: 3, filename "/mnt/shared/multi_mmap_test/multi_mmap_test_file"
nopen-nd1 (rank 0): Write  via mmap  block 0 of 'a'
nopen-nd2 (rank 1): Error 2 opening file "/mnt/shared/multi_mmap_test/multi_mmap_test_file": No such file or directory
nopen-nd3 (rank 2): Error 2 opening file "/mnt/shared/multi_mmap_test/multi_mmap_test_file": No such file or directory
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD 
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[nopen-nd1:04066] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[nopen-nd1:04066] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Runtime 10 seconds.
2015/03/31,14:29:26  Umount volume from all nodes.
2015/03/31,14:29:26  /opt/ocfs2-test/bin/remote_umount.py -m /mnt/shared -n nopen-nd1,nopen-nd2,nopen-nd3
+ /usr/lib64/mpi/gcc/openmpi/bin/mpirun -mca btl tcp,self -mca orte_rsh_agent ssh:rsh -np 3 --host nopen-nd1,nopen-nd2,nopen-nd3 /opt/ocfs2-test/bin/command.py --umount -m /mnt/shared
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[49501,1],2]
  Exit code:    1
--------------------------------------------------------------------------
2015/03/31,14:29:26  Remote umount failed
renzhengeek commented 8 years ago
Sep 24 22:43:02 m1 sudo[8955]: ocfs2test : TTY=pts/0 ; PWD=/home/ocfs2test/bin/ocfs2/bin ; USER=root ; COMMAND=/sbin/mkfs.ocfs2 --fs-features=xattr -b 512 -C 1048576 -N 2 -L ocfs2-
Sep 24 22:43:02 m1 sudo[8955]: pam_unix(sudo:session): session opened for user root by root(uid=0)
Sep 24 22:43:02 m1 kernel: dlm: Using TCP for communications
Sep 24 22:43:03 m1 dlm_controld[1897]: 191602 cpg_dispatch error 9
Sep 24 22:43:03 m1 systemd-udevd[8947]: remove old symlink, '/dev/disk/by-uuid/a8aba173-88e6-40d7-bf68-f62998af0da3' no longer belonging to '/devices/platform/host2/session1/target
renzhengeek commented 8 years ago
Sep 25 17:02:34 n3 kernel: type=1006 audit(1443171754.851:377): pid=8612 uid=0 old auid=4294967295 new auid=1001 old ses=4294967295 new ses=333 res=1
Sep 25 17:02:34 n3 sshd[8612]: pam_unix(sshd:session): session opened for user ocfs2test by (uid=0)
Sep 25 17:02:34 n3 sshd[8612]: Accepted publickey for ocfs2test from 147.2.208.59 port 53601 ssh2: RSA f2:50:9f:e9:08:d7:af:28:78:04:e1:89:c8:f2:7e:f7 [MD5]
Sep 25 17:00:37 n3 sudo[8525]: pam_unix(sudo:session): session closed for user root
Sep 25 17:00:33 n3 sudo[8525]: pam_unix(sudo:session): session opened for user root by root(uid=0)
Sep 25 17:00:33 n3 sudo[8525]: ocfs2test : TTY=pts/0 ; PWD=/home/ocfs2test/bin/ocfs2/bin ; USER=root ; COMMAND=/usr/bin/vim /etc/sudoers
Sep 25 17:00:04 n3 systemd-udevd[8399]: remove old symlink, '/dev/disk/by-uuid/2876e19a-8ffc-489e-aad2-bc27ea27d365' no longer belonging to '/devices/platform/host2/session1/target
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/platform/i8042/serio1/input/input2/event1
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.7/usb4/4-1/4-1:1.0/input/input5/mouse1
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.7/usb4/4-1/4-1:1.0/input/input5/js0
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.7/usb4/4-1/4-1:1.0/input/input5/event4
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.7/usb4/4-1/4-1:1.0/input/input5
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.7/usb4/4-1/4-1:1.0/0003:0627:0001.0001
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.7/usb4/4-1/4-1:1.0
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.7/usb4/4-1
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.1/usb2/2-0:1.0
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.2/usb3/3-0:1.0
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/platform/i8042/serio1/input/input2/mouse0
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.7/usb4/4-0:1.0
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/platform/pcspkr/input/input4/event3
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.0/usb1/1-0:1.0
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/LNXSYSTM:00/LNXPWRBN:00/input/input3/event2
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.0/usb1
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/platform/i8042/serio0/input/input0/event0
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/platform/i8042/serio1/input/input2
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.7/usb4
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.2/usb3
Sep 25 17:00:04 n3 kernel: intel_rapl: no valid rapl domains found in package 0
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/pci0000:00/0000:00:05.1/usb2
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/platform/i8042/serio0/input/input0
Sep 25 17:00:04 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/platform/pcspkr/input/input4
Sep 25 17:00:03 n3 upowerd[1686]: (upowerd:1686): UPower-Linux-WARNING **: treating change event as add on /sys/devices/virtual/input/mice
Sep 25 17:00:03 n3 sudo[8347]: pam_unix(sudo:session): session closed for user root
Sep 25 17:00:03 n3 sudo[8347]: pam_unix(sudo:session): session opened for user root by root(uid=0)
Sep 25 17:00:03 n3 sudo[8347]: ocfs2test : TTY=pts/0 ; PWD=/home/ocfs2test/bin/ocfs2/bin ; USER=root ; COMMAND=/usr/bin/udevadm trigger
Sep 25 17:00:01 n3 CRON[8348]: pam_unix(crond:session): session closed for user root
Sep 25 17:00:01 n3 kernel: type=1006 audit(1443171601.655:376): pid=8348 uid=0 old auid=4294967295 new auid=0 old ses=4294967295 new ses=332 res=1
Sep 25 17:00:01 n3 cron[8348]: pam_unix(crond:session): session opened for user root by (uid=0)
Sep 25 16:55:54 n3 systemd[8117]: pam_unix(systemd-user:session): session closed for user ocfs2test
Sep 25 16:55:54 n3 sshd[8114]: pam_unix(sshd:session): session closed for user ocfs2test
Sep 25 16:55:54 n3 sshd[8118]: Received disconnect from 147.2.208.59: 11: disconnected by user
Sep 25 16:55:54 n3 systemd[8116]: pam_unix(systemd-user:session): session opened for user ocfs2test by (uid=0)
renzhengeek commented 8 years ago

http://unix.stackexchange.com/questions/21709/where-is-udev-getting-the-id-for-iscsi-devices

udev creates the device files, symlinks on the fly based on the rules defined /etc/udev/rules.d. UIDs are generated by udev something like uuidgen and that takes name, count of name characters, size, physical attributes and other geometric parameters into consideration for generating the UIDs. –
renzhengeek commented 8 years ago
n3:~ # cd /dev/disk/by-
by-id/    by-label/ by-path/  by-uuid/  
n3:~ # cd /dev/disk/by-label/
n3:/dev/disk/by-label # ls
multi-flock_unit-test
n3:/dev/disk/by-label # ll
total 0
lrwxrwxrwx 1 root root 9 Sep 25 17:00 multi-flock_unit-test -> ../../sda
renzhengeek commented 8 years ago

https://doc.opensuse.org/documentation/html/openSUSE_121/opensuse-reference/cha.udev.html

Chapter 8. Dynamic Kernel Device Management with udev¶

Contents

8.1. The /dev Directory
8.2. Kernel uevents and udev
8.3. Drivers, Kernel Modules and Devices
8.4. Booting and Initial Device Setup
8.5. Monitoring the Running udev Daemon
8.6. Influencing Kernel Device Event Handling with udev Rules
8.7. Persistent Device Naming
8.8. Files used by udev
8.9. For More Information

The kernel can add or remove almost any device in a running system. Changes in the device state (whether a device is plugged in or removed) need to be propagated to userspace. Devices need to be configured as soon as they are plugged in and recognized. Users of a certain device need to be informed about any changes in this device's recognized state. udev provides the needed infrastructure to dynamically maintain the device node files and symbolic links in the /dev directory. udev rules provide a way to plug external tools into the kernel device event processing. This enables you to customize udev device handling by, for example, adding certain scripts to execute as part of kernel device handling, or request and import additional data to evaluate during device handling. 
renzhengeek commented 8 years ago

I've found out the root cause. This problem matters /etc/blkid.conf. It works after using the default value of EVALUATE.

Quote from man blkid: "EVALUATE=method Defines LABEL and UUID evaluation method(s). Currently, the libblkid library supports "udev" and "scan" methods. More than one methods may be specified in a comma separated list. Default is "udev,scan". The "udev" method uses udev /dev/disk/by-* symlinks and the "scan" method scans all block devices from the /proc/partitions file. " blkid.conf effects on libblkid that used by "mount".

So, drop this patch. Thanks for helping work this out.