xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.32k stars 74 forks source link

xe sr-probe does not find NFSv4/NFSv4.1 #135

Open borzel opened 5 years ago

borzel commented 5 years ago

Background:

Steps:

  1. enable NFSv4 on FreeNAS and reboot it to fully enable it (!) grafik
  2. Create new SR with XCP-ng Center (https://github.com/xcp-ng/xenadmin/releases/tag/v7.6.3.21) with NFSv4.1 (or XenOrchestra -> this step just proves NFSv4.1 support) grafik
  3. Detach SR
  4. run xe sr-probe type=nfs device-config:server=<some-ip> device-config:serverpath=/mnt/testpool/nfsv41test device-config:probeversion

Output of sr-probe

<?xml version="1.0" ?>
<SRlist>
        <SR>
                <UUID>2131f8f8-49be-0868-2b97-d624c9851584</UUID>
        </SR>
        <SR>
                <UUID>cce30ede-0fd2-b34a-04e5-291b56b9f8fc</UUID>
        </SR>
        <SupportedVersions>
                <Version>3</Version>
        </SupportedVersions>
</SRlist>

Expected result Shows SupportedVersions 3, 4 and 4.1

thctlo commented 5 years ago

Hm getting : Host Xen XCP 7.6.3 , NFSv4 enabled Server Debian 9. exports file contains. /srv/nfs4 192.168.xxx.0/24(rw,sync,fsid=0,crossmnt,no_subtree_check,sec=sys:krb5:krb5i:krb5p) /srv/nfs4/xcphosts 192.168.xxx.0/24(rw,sync,no_subtree_check,sec=sys:krb5:krb5i:krb5p)

This current setup works for v3/v4/v4.1 with and without kerberos mounts.

xe sr-probe type=nfs device-config:server=IP_HERE device-config:serverpath=/xen/nfs-stor device-config:probeversion Error code: SR_BACKEND_FAILURE_73 Error parameters: , NFS mount error [opterr=mount failed with return code 32],

thctlo commented 5 years ago

running : xe sr-probe type=nfs device-config:server=IP_HERE device-config:probeversion

Error code: SR_BACKEND_FAILURE_101
Error parameters: , The request is missing the serverpath parameter, <?xml version="1.0" ?>
<nfs-exports>
<Export>
<Target>192.168.xxx.xxx</Target>
<Path>/srv/nfs4/xcphosts</Path>
<Accesslist>192.168.xxx.0/24</Accesslist>
</Export>
<Export>
<Target>192.168.xxx.xxx</Target>
<Path>/srv/nfs4</Path>
<Accesslist>192.168.xxx.0/24</Accesslist>
</Export>
</nfs-exports>
stormi commented 5 years ago

@borzel Is that a problem that occurs for any NFS share that supports NFS 4 or above, or only with specific servers?

NormHenderson commented 4 years ago

Not sure if it is related to the same root cause however: when xe sr-create specifies only type=nfs, it defaults to NFSv3 and will not negotiate NFSv4/NFSv4.1 share without adding device-config:nfsversion=4.1 (which isn't documented as far as I can tell). IMHO it should be starting with NFSv4.1 and negotiating downwards. (XCP-ng 8.1 connecting to nfs-kernel-server on Ubuntu 20.04)

ondraknezour commented 3 years ago

I had similar problem with FreeBSD NFS server set up with minimal NFS version 4 (vfs.nfsd.server_min_nfsvers=4 in the /etc/sysctl.conf file) [1].

I was told [2], that NFS v4 doesn't use RPC, so if support for older protocol version isn't needed, nfsd would not register with rcpbind, making function check_server_service in /opt/xensource/sm/nfs.py unreliable and invalid, because it checks for condition (nfs service in rpcinfo -s output) which is not always present.

[1] https://lists.freebsd.org/pipermail/freebsd-net/2021-January/057371.html [2] https://lists.freebsd.org/pipermail/freebsd-net/2021-January/057372.html

jcharaoui commented 3 years ago

I've hit this bug with an NFSv4+ only server (eg. NFSv3 is disabled). XCP-ng is unable to add the SR because it depends on the presence of NFSv3 services.

olivierlambert commented 3 years ago

Then I think it should be reported upstream ASAP :)

https://github.com/xapi-project/sm/issues

TristisOris commented 3 years ago

still actual for Huawei storage.

olivierlambert commented 3 years ago

IIRC, this was fixed in a recent upstream SMAPI patch (but likely not yet available in XCP-ng. @stormi can you take a look where you are around? Thanks!

stormi commented 3 years ago

I don't remember commits that would address this, and the issue on the upstream repository got no answers from the devs.

Recent commits that are about NFS in sm are: https://github.com/xapi-project/sm/commit/e1218647f0920e3d489c7155b823f45ca21715ea and https://github.com/xapi-project/sm/commit/6fbff68f74343c54c19b74c6bd3e66625d955495 but I don't think they are related to this issue here.

emanzx commented 10 months ago

Any update on this? I have checked that the upstream already push a fix for this issues. but when I update my XCP-ng installation the file still not updated. So I take my own way and just replaced the driver file for NFSSR.py and nfs.py with the update. but still not working and xpc-ng center just giving me this error.

image

I really need NFS V4 to work as PETASAN instance only support NFS v4 and above for their NFS exports. Thanks.

benjamreis commented 10 months ago

Hi!

The fix wasn't available in XCP-ng 8.2.1, it'll be released soon but if you want to test it in advance: yum update sm sm-rawhba --enablerepo=xcp-ng-ci

Bear in mind it is a test build so not safe to run in production. Regards :)

emanzx commented 10 months ago

Hi!

The fix wasn't available in XCP-ng 8.2.1, it'll be released soon but if you want to test it in advance: yum update sm sm-rawhba --enablerepo=xcp-ng-ci

Bear in mind it is a test build so not safe to run in production. Regards :)

Thanks for the update. I will try it with my test server. may I know when the next update that the fix will be commited?

benjamreis commented 10 months ago

It's currently in the CI phase of our pipeline, so if everything goes smoothly I'd say a couple weeks. More if we find issues.

viniciusferrao commented 9 months ago

Any news on this one? I was actually surprised to see that NFSv4 only servers are an issue because XCP-ng manual states that NFS is preferred instead of iSCSI.

So I started the planning to move away from iSCSI and stumbled upon this issue.

NormHenderson commented 9 months ago

@viniciusferrao I had a very bad experience with XCP-ng storage on iSCSI. For the last 3 years I have been using nfs 4.1 without any real difficulties (some performance concerns when VMs boot and until they stabilize, but I was never able to narrow down the cause). I am on XCP-ng v.8.2 which has the option to select nfs v3 v4 or v4.1 when creating a new nfs SR. I also use option "hard" which has pros and cons, there are other threads here on that subject.

viniciusferrao commented 9 months ago

@viniciusferrao I had a very bad experience with XCP-ng storage on iSCSI. For the last 3 years I have been using nfs 4.1 without any real difficulties (some performance concerns when VMs boot and until they stabilize, but I was never able to narrow down the cause). I am on XCP-ng v.8.2 which has the option to select nfs v3 v4 or v4.1 when creating a new nfs SR. I also use option "hard" which has pros and cons, there are other threads here on that subject.

But is there any workaround today? Because I tried to mount the volume and was affected by the issue on this ticket.

My XCP-ng dates back to 2013 when I originally installed XenServer 6.2. I've been updating it since then. The same for the storage system that's FreeNAS (at the time) and now TrueNAS. The disk pool was created in early 2014. Since the beginning, this pool is iSCSI and I had very expensive workloads on it, like Exchange 2010 and later 2013 with 700 user accounts, more than a TB of iSCSI mailboxes on top of XenServer virtual disks.

And now I was moving to NFS, due to the cited recommendation and I'm unable to.

How to mount the NFS share? What's the workaround? TrueNAS does not enables NFSv3 and v4 at the same time.

Thanks.

NormHenderson commented 9 months ago

For me, it was just in Xen Orchestra: select the pool SR - create a new SR Select storage type: NFS Settings: Server (your NFS path) NFS version 4.1 NFS options (in my case, study the implications) hard Similar process in XCP-ng Center.

However your question makes me wonder if you are even talking about an XCP-ng storage repositiory - possibly connecting to an NFS server from a VM? I do that too, from Linux at least it's standard mount -t nfs4, no magic.

viniciusferrao commented 9 months ago

For me, it was just in Xen Orchestra: select the pool SR - create a new SR Select storage type: NFS Settings: Server (your NFS path) NFS version 4.1 NFS options (in my case, study the implications) hard Similar process in XCP-ng Center.

However your question makes me wonder if you are even talking about an XCP-ng storage repositiory - possibly connecting to an NFS server from a VM? I do that too, from Linux at least it's standard mount -t nfs4, no magic.

Yeah, this does not work. I'm affected by the bug on this thread. I thought you had a workaround for it. Your NFS server probably supports NFSv3 and v4 at the same time, which isn't my case.

stormi commented 9 months ago

Yes, the issue is when the NFS server doesn't advertise what it supports through rpcbind, which is a v3-only thing (rpcbing can report, although is not obligated to do so, also v4.x protocol versions, which explains why some users can select v4.x protocols when their server also supports v3).

We do have a fix for this, it is build, and is currently on a pre-release repository before it can be released to all users.

On XCP-ng 8.2, you can try it with:

yum update sm sm-rawhba --enablerepo=xcp-ng-ci,xcp-ng-testing,xcp-ng-candidates

Internal CI tests already ran successfully.

On XCP-ng 8.3, it should be already supported.

viniciusferrao commented 9 months ago

Yes, the issue is when the NFS server doesn't advertise what it supports through rpcbind, which is a v3-only thing (rpcbing can report, although is not obligated to do so, also v4.x protocol versions, which explains why some users can select v4.x protocols when their server also supports v3).

We do have a fix for this, it is build, and is currently on a pre-release repository before it can be released to all users.

On XCP-ng 8.2, you can try it with:

yum update sm sm-rawhba --enablerepo=xcp-ng-ci,xcp-ng-testing,xcp-ng-candidates

Internal CI tests already ran successfully.

On XCP-ng 8.3, it should be already supported.

Thank you @stormi. But may I ask if there's any timeline to it lands on stable channels? On 8.2.1 or 8.3?

stormi commented 9 months ago

On 8.2, it will go with the next train of updates, which is not scheduled yet. A few weeks maybe. It's already in XCP-ng 8.3, but 8.3 itself is still a (rather stable) beta.

prilly-dev commented 1 week ago

This bug still exists in xcp-ng 8.3

stormi commented 1 week ago

Please elaborate, as it's actually fixed from our point of view. It's likely you have a different albeit similar issue.

prilly-dev commented 1 week ago

What to say, attaching nfs share with v4 or v4.1 only works when the nfs share has v3 enabled, what you write earlier perfectly sums this issu up:

Yes, the issue is when the NFS server doesn't advertise what it supports through rpcbind, which is a v3-only thing (rpcbing can report, although is not obligated to do so, also v4.x protocol versions, which explains why some users can select v4.x protocols when their server also supports v3

Also worth noting this issue is occuring when attaching storage in xo also, with or without kerberose.

stormi commented 1 week ago

We have automated tests which precisely test a server which only has v4+ and no v3, so it's likely there's something else in the picture. @benjamreis how to debug this?

benjamreis commented 1 week ago

Probably sharing the error gotten while trying to probe or create the SR would be a good start -- even better the corresponding logs in xensource.log and SMlog :+1:

prilly-dev commented 1 week ago

Give me some time, i will post the logs latertoday

prilly-dev commented 1 week ago

This is log from XO storage when V4 and v4.1 only share in works:

remote.test { "id": "a74654a5-509b-4d6b-8a42-06e5713ed882

" } { "shortMessage": "Command failed with exit code 32: mount -o port=2049 -t nfs 172.16.10.10:/nfs/backup /run/xo-server/mounts/a74654a5-509b-4d6b-8a42-06e5713ed882

", "command": "mount -o port=2049 -t nfs 172.16.10.10:/nfs/backup /run/xo-server/mounts/a74654a5-509b-4d6b-8a42-06e5713ed882

", "escapedCommand": "mount -o \"port=2049\" -t nfs \"172.16.10.10:/nfs/backup\" \"/run/xo-server/mounts/a74654a5-509b-4d6b-8a42-06e5713ed882

\"", "exitCode": 32, "stdout": "", "stderr": "mount.nfs: Protocol not supported", "failed": true, "timedOut": false, "isCanceled": false, "killed": false, "message": "Command failed with exit code 32: mount -o port=2049 -t nfs 172.16.10.10:/nfs/backup /run/xo-server/mounts/a74654a5-509b-4d6b-8a42-06e5713ed882

mount.nfs: Protocol not supported", "name": "Error", "stack": "Error: Command failed with exit code 32: mount -o port=2049 -t nfs 172.16.10.10:/nfs/backup /run/xo-server/mounts/a74654a5-509b-4d6b-8a42-06e5713ed882

mount.nfs: Protocol not supported at makeError (/etc/xen-orchestra/node_modules/execa/lib/error.js:60:11) at handlePromise (/etc/xen-orchestra/node_modules/execa/index.js:118:26) at NfsHandler._sync (/etc/xen-orchestra/@xen-orchestra/fs/src/_mount.js:68:7)" }

This is log from a NFS SR attached with V3 V4 and V4.1 enabled, then disabled V3 and did a rescan of the SR

sr.scan { "id": "7a89bd71-8635-173f-54de-19684d061d4f" } { "code": "SR_BACKEND_FAILURE_47", "params": [ "", "The SR is not available [opterr=no such directory /var/run/sr-mount/7a89bd71-8635-173f-54de-19684d061d4f]", "" ], "task": { "uuid": "28f35853-1149-45ed-ca17-ad7ae65a8082

", "name_label": "Async.SR.scan", "name_description": "", "allowed_operations": [], "current_operations": {}, "created": "20241120T18:21:50Z", "finished": "20241120T18:21:50Z", "status": "failure", "resident_on": "OpaqueRef:a2ff60a3-d6ce-465b-874c-be3d797ba33a", "progress": 1, "type": "", "result": "", "error_info": [ "SR_BACKEND_FAILURE_47", "", "The SR is not available [opterr=no such directory /var/run/sr-mount/7a89bd71-8635-173f-54de-19684d061d4f]", "" ], "other_config": {}, "subtask_of": "OpaqueRef:NULL", "subtasks": [], "backtrace": "(((process xapi)(filename lib/backtrace.ml)(line 210))((process xapi)(filename ocaml/xapi/storage_access.ml)(line 36))((process xapi)(filename ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml)(line 39))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 143))((process xapi)(filename ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml)(line 39))((process xapi)(filename ocaml/xapi/rbac.ml)(line 191))((process xapi)(filename ocaml/xapi/rbac.ml)(line 200))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 75)))" }, "message": "SR_BACKEND_FAILURE_47(, The SR is not available [opterr=no such directory /var/run/sr-mount/7a89bd71-8635-173f-54de-19684d061d4f], )", "name": "XapiError", "stack": "XapiError: SR_BACKEND_FAILURE_47(, The SR is not available [opterr=no such directory /var/run/sr-mount/7a89bd71-8635-173f-54de-19684d061d4f], ) at Function.wrap (file:///etc/xen-orchestra/packages/xen-api/_XapiError.mjs:16:12) at default (file:///etc/xen-orchestra/packages/xen-api/_getTaskResult.mjs:13:29) at Xapi._addRecordToCache (file:///etc/xen-orchestra/packages/xen-api/index.mjs:1047:24) at file:///etc/xen-orchestra/packages/xen-api/index.mjs:1081:14 at Array.forEach () at Xapi._processEvents (file:///etc/xen-orchestra/packages/xen-api/index.mjs:1071:12) at Xapi._watchEvents (file:///etc/xen-orchestra/packages/xen-api/index.mjs:1244:14)" }

There might be a possibility that this error is caused by a issue in QNAP QTS version 5.2.1, this is unconfirmed but some googling indicates QNAP is crap as usual, i will test this with a dell powerstore and see if this is storage related, as i realy suspect now after testing

benjamreis commented 1 week ago

Hi,

Thx for the logs - unfortunantely XO doesn't provide all the necessay info of the error as its only a client of th XAPI. What I asked was the returns of the xe sr-probe and sr-create calls and the log in /var/log/xensource.log /var/log/SMlog corresponding to the call.

The error does sm to indicate the mount is attempted on NFS3 for som reason... While you gather the logs i askd i'll take a look at the code again but as mentioned by @stormi - our CI does have a NFS4+ only tests that run successfully.