opensvc / multipath-tools

Other
60 stars 48 forks source link

Question about mpathpersist? #71

Closed lixiaokeng closed 1 year ago

lixiaokeng commented 1 year ago

Here we meet a problem. When we register a prkey(dm has 2 paths), the mpathpersist is block 5s. The reason is that the second path request will come in storage server handling the first path. This will lead the second request come into waiting(5s) and retry process. If the second request comes after first that finshing, it can be handled correctly. Here we found that registration is asynchronous, while reservation is synchronous. Why they are different?

mwilck commented 1 year ago

Can you please attach logs generated with verbosity 4 to illustrate what you mean?

lixiaokeng commented 1 year ago

The current problem should be a compatibility issue between multipath services and storage services. In some distributed storage services, it can not process two registration requests at the same time. If two request arrives at the primary storage server almost simultaneously,the second request will be returned to the backup storage server and waited for 5 seconds on the backup service because primary storage server is dealing the first request . The reason of two request arriving at same time is that mpath_prout_reg creates two threads to send io. So I want to know why registration is asynchronous? If we finish first one request then send second request, the primary storage server can handle it correctly。

mwilck commented 1 year ago

Thanks for the explanantion. I'd still like to see some logs, it's difficult to get a clear idea of what's happening just from the high-level description.

lixiaokeng commented 1 year ago

253132.082847 | THREAD ID [0] INFO] 253132.082852 | rq_servact=6 253132.082855 | rq_scope=0 253132.082859 | rq_type=0 253132.082862 | rkey= 253132.082866 | paramp->sa_flags =00 253132.082870 | noisy=1 253132.082873 | status=-1 253132.082877 | THREAD ID [1] INFO] 253132.082880 | rq_servact=6 253132.082884 | rq_scope=0 253132.082887 | rq_type=0 253132.082891 | rkey= 253132.082894 | paramp->sa_flags =00 253132.082898 | noisy=1 253132.082901 | status=-1 253132.082905 | THREAD ID [2] INFO] 253132.082908 | rq_servact=6 253132.082912 | rq_scope=0 253132.082915 | rq_type=0 253132.082919 | rkey= 253132.082922 | paramp->sa_flags =00 253132.082926 | noisy=1 253132.082929 | status=-1 253132.082934 | 368886030000001380009498563760c9f: sending pr out command to sdih 253132.082980 | 368886030000001380009498563760c9f: sending pr out command to sdii 253132.083015 | 368886030000001380009498563760c9f: sending pr out command to sdij 253132.083036 | 00 00 00 00 00 00 00 00 00 00 60 6f db 7a dc 83
253132.083061 | 00 00 00 00 00 00 00 00 00 00 60 6f db 7a dc 83
253132.083106 | 00 00 00 00 00 00 00 00 253132.083081 | 00 00 00 00 00 00 00 00 00 00 60 6f db 7a dc 83
253132.083075 | 00 00 00 00 00 00 00 00 253132.083145 | 00 00 00 00 00 00 00 00 253132.083971 | sdii: status driver:00 host:00 scsi:00 253132.083981 | sdii: status = 0 253137.338110 | sdij: status driver:00 host:00 scsi:00 253137.338160 | sdij: status = 0 253137.374076 | sdih: status driver:00 host:00 scsi:00 253137.374108 | sdih: status = 0 253137.374186 | 368886030000001380009498563760c9f: pr message=map 368886030000001380009498563760c9f setprstatus 253137.389038 | 368886030000001380009498563760c9f: message=map 368886030000001380009498563760c9f setprstatus reply=ok

lixiaokeng commented 1 year ago

Sdii is finished immediately, but sdij and sdih waited for 5 seconds.

mwilck commented 1 year ago

Hm. What exactly could multipathd do differently here?

lixiaokeng commented 1 year ago

According to storage service staff,this could be solved if mpath_prout_reg send ioctl commands one by one but not create phtread to do it. So I ask why registration is asynchronous?

mwilck commented 1 year ago

So I ask why registration is asynchronous?

That predates my involvement in multipath-tools. I think it has always been implemented this way. Typically, the reason is to avoid multipath being blocked by possibly long-running synchronous SCSI commands. I guess PROUT can hang if the SCSI device itself is unresponsive, and multipathd must be able to deal with this situation.

But TBH I don't understand what multipath's threaded implementation has to do with your issue. Is this storage unable to deal with multiple PROUT commands being sent to different ports? What exactly is the restriction?

lixiaokeng commented 1 year ago

Because registration will change the metadata of the lun, the primary storage server will lock and it can't deal with two or more request at the same time. There is no queue in the primary storage, so it will send the second request to backup storage server and enter waiting mode(5s). To be honest, I am not familiar with this storage either. This is what the person in charge of the storage told me. I have the same question. Why did they design it this way? Maybe this question is the storage needs to think about the solution to this problem but not multipath-tools. Because we don't meet this question in other storage.

mwilck commented 1 year ago

Well this makes a certain amount of sense, although we've never seen this behavior before.

Can you find out what exactly the restriction is? I.e., can the entire storage unit only process one STPG at a time? Or only one STPG per port / LUN / ..., whatever?

lixiaokeng commented 1 year ago

I can't get more information. Sorry. Maybe this issue could be close.