sahlberg / libnfs

NFS client library
Other
521 stars 201 forks source link

XBMC Gotham Beta1 NFS accessibility issue on android #60

Closed firemeteorxx closed 10 years ago

firemeteorxx commented 10 years ago

Please see the forum post for details about the setup: http://forum.xbmc.org/showthread.php?tid=188811

The error log upon failed access: 23:08:32 T:2123676280 ERROR: Failed to open(//Photos/) opendir call failed with "zdr_replymsg failed in portmap_getport_reply" 23:08:32 T:1994413704 ERROR: static bool XFILE::CDirectory::GetDirectory(const CStdString&, CFileItemList&, const XFILE::CDirectory::CHints&, bool) - Error getting nfs://192.168.1.9//mnt/data0/Media/Photos/ 23:08:32 T:1994413704 ERROR: CGUIDialogFileBrowser::GetDirectory(nfs://192.168.1.9//mnt/data0/Media/Photos/) failed

I was directed by google to this location: https://github.com/sahlberg/libnfs/blob/master/lib/pdu.c In this function: static int rpc_process_reply(struct rpc_context rpc, struct rpc_pdu pdu, ZDR *zdr)

The issue is easy to reproduce here, I think it'll be easy to debug if a debug version is provided with more verbose error message.

sahlberg commented 10 years ago

Hi,

The error message itself is a little misleading but it occurs at two different places. The first place it returns is where a malloc() fails. This is unlikely since if you ran out of memory you would see other failures as well. This is not to rule out that there could be a memory leak however, but even if so, it should fail always, not just for specific directories.

The second place where the error occurs is due to a failure to unmarshall and decode the reply packet that comes back from the server. Which suggests that there is something in the reply that libnfs can not parse.

Can you collect a wireshark trace that contain a failure and send me so I can see what the server response packet looks like ?

regards ronnie sahlberg

On Mon, Mar 10, 2014 at 8:52 PM, firemeteorxx notifications@github.comwrote:

Please see the forum post for details about the setup: http://forum.xbmc.org/showthread.php?tid=188811

The error log upon failed access: 23:08:32 T:2123676280 ERROR: Failed to open(//Photos/) opendir call failed with "zdr_replymsg failed in portmap_getport_reply" 23:08:32 T:1994413704 ERROR: static bool XFILE::CDirectory::GetDirectory(const CStdString&, CFileItemList&, const XFILE::CDirectory::CHints&, bool) - Error getting nfs:// 192.168.1.9//mnt/data0/Media/Photos/ 23:08:32 T:1994413704 ERROR: CGUIDialogFileBrowser::GetDirectory(nfs:// 192.168.1.9//mnt/data0/Media/Photos/) failed

I was directed by google to this location: https://github.com/sahlberg/libnfs/blob/master/lib/pdu.c In this function: static int rpc_process_reply(struct rpc_context rpc, struct rpc_pdu pdu, ZDR *zdr)

The issue is easy to reproduce here, I think it'll be easy to debug if a debug version is provided with more verbose error message.

Reply to this email directly or view it on GitHubhttps://github.com/sahlberg/libnfs/issues/60 .

sahlberg commented 10 years ago

Additionally, Android is a really important usecase.

Do you know if this is broken in general for all android users, or do you think it is something specific to your environment ?

regards ronnie sahlberg

On Tue, Mar 11, 2014 at 8:56 AM, ronnie sahlberg ronniesahlberg@gmail.comwrote:

Hi,

The error message itself is a little misleading but it occurs at two different places. The first place it returns is where a malloc() fails. This is unlikely since if you ran out of memory you would see other failures as well. This is not to rule out that there could be a memory leak however, but even if so, it should fail always, not just for specific directories.

The second place where the error occurs is due to a failure to unmarshall and decode the reply packet that comes back from the server. Which suggests that there is something in the reply that libnfs can not parse.

Can you collect a wireshark trace that contain a failure and send me so I can see what the server response packet looks like ?

regards ronnie sahlberg

On Mon, Mar 10, 2014 at 8:52 PM, firemeteorxx notifications@github.comwrote:

Please see the forum post for details about the setup: http://forum.xbmc.org/showthread.php?tid=188811

The error log upon failed access: 23:08:32 T:2123676280 ERROR: Failed to open(//Photos/) opendir call failed with "zdr_replymsg failed in portmap_getport_reply" 23:08:32 T:1994413704 ERROR: static bool XFILE::CDirectory::GetDirectory(const CStdString&, CFileItemList&, const XFILE::CDirectory::CHints&, bool) - Error getting nfs:// 192.168.1.9//mnt/data0/Media/Photos/ 23:08:32 T:1994413704 ERROR: CGUIDialogFileBrowser::GetDirectory(nfs:// 192.168.1.9//mnt/data0/Media/Photos/) failed

I was directed by google to this location: https://github.com/sahlberg/libnfs/blob/master/lib/pdu.c In this function: static int rpc_process_reply(struct rpc_context rpc, struct rpc_pdu pdu, ZDR *zdr)

The issue is easy to reproduce here, I think it'll be easy to debug if a debug version is provided with more verbose error message.

Reply to this email directly or view it on GitHubhttps://github.com/sahlberg/libnfs/issues/60 .

firemeteorxx commented 10 years ago

Here is the requested traffic capture: http://pastebin.com/HTZHPVtu I narrow it down to the failed session on 'Videos' sub-dir. It turns out that this has something to do with the permission setup. And I can't rule out the suspicion on both the server && client. Maybe this is specific to my environment, but I won't say for sure so far.

I find one pattern about this issue -- for directories that is inaccessible, there appears to be at least one entry that server denied based on permission setup. Here comes the demonstration:

For accessible directories, all entries inside are at least always readable.

[root@freenas] /mnt/data0/Media# ls -l Musics/
total 80
drwxrwsr-x  2 firemeteor  linux-staff    3 Jun 11  2013 $RECYCLE.BIN/
drwxrwsr-x+ 7 root        linux-staff    8 Jul  3  2013 ./
drwxr-sr-x+ 7 root        linux-staff    7 Feb 16 21:15 ../
drwxrwxr-x  2 root        linux-staff   24 Apr  3  2009 Le Peuple migrateur/
drwxrwsr-x  2 firemeteor  linux-staff   13 Jun 12  2013 Sample Music/
drwxrwxr-x  7 root        linux-staff    7 Dec 24  2005 Secret Garden/
-rw-rw-r--  1 firemeteor  linux-staff  380 Jun 12  2013 desktop.ini
drwxrwxr-x  2 root        linux-staff    5 May 15  2011 ????????? - ???????????????/
[root@freenas] /mnt/data0/Media# ls -l Comics/
total 50
drwxr-sr-x+ 4 firemeteor  linux-staff   4 Feb 16 21:38 ./
drwxr-sr-x+ 7 root        linux-staff   7 Feb 16 21:15 ../
drwxrwxr-x  2 firemeteor  linux-staff  12 Jul 31  2010 3x3 eyes/
drwxrwxr-x  2 firemeteor  linux-staff   9 Aug  4  2009 ????????????????????????/

For failed ones, at least some of the cases are not readable.

[root@freenas] /mnt/data0/Media# ls -l Videos/
total 125
drwxrwsr-x  2 firemeteor  linux-staff      3 Jun 11  2013 $RECYCLE.BIN/
drwxrwsr-x+ 7 root        linux-staff      9 Feb 10 23:31 ./
drwxr-sr-x+ 7 root        linux-staff      7 Feb 16 21:15 ../
-rw-rw-r--  1 firemeteor  linux-staff  10240 Jun 11  2013 Thumbs.db
-rw-rw-r--  1 firemeteor  linux-staff     78 Jun 11  2013 desktop.ini
drwxrwsr-x+ 4 root        linux-staff      4 Feb 25 21:05 movies/
drwxrwsr-x+ 8 root        linux-staff      8 Mar  9 14:58 series/
drwxrwxr-x  2 firemeteor  linux-staff      3 Sep 16 20:20 ??????/
d---rws---+ 2 nobody      linux-staff     44 Feb 16 20:52 ?????????????????????/
[root@freenas] /mnt/data0/Media# ls -l Photos/
total 64
drwxrwsr-x   2 firemeteor  linux-staff    3 Jun 11  2013 $RECYCLE.BIN/
drwxrwsr-x+  7 root        linux-staff    8 Jan 12 21:09 ./
drwxr-sr-x+  7 root        linux-staff    7 Feb 16 21:15 ../
drwxrwsr-x  13 nobody      linux-staff   13 Nov  2 23:22 2013/
drwxrwsr-x   2 nobody      linux-staff  116 Mar  3 22:01 2013-9-29??????/
d---rws---+  2 nobody      linux-staff  321 Feb 24 20:20 2014-1-12iphone/
-rw-rw-r--   1 firemeteor  linux-staff  380 Jun 12  2013 desktop.ini
d---rws---+  3 nobody      linux-staff    3 Jan 12 21:28 dour iphone sync/

I don't expect the entire directory been blocked for a single inaccessible entry inside. This appears to be fault for libnfs, but I can't rule-out the server yet. This is because I spotted one malformed reply to one READDIRPLUS request. I'm not sure if this is an whireshark decoding issue. I have the raw data, but my current net environment has blocked most of file hosting sites.

I can find a way to upload the raw data later, if you think it's important.

firemeteorxx commented 10 years ago

My permission setup is to ensure group access through nfsv4ACL to any newly created entry and use sticky bit to enforce the owning group.

This works great until I introduced the android NFS client. I'm not sure what UID / GID is used in this case?

sahlberg commented 10 years ago

On Tue, Mar 11, 2014 at 10:08 PM, firemeteorxx notifications@github.comwrote:

Here is the requested traffic capture: http://pastebin.com/HTZHPVtu I narrow it down to the failed session on 'Videos' sub-dir. It turns out that this has something to do with the permission setup. And I can't rule out the suspicion on both the server && client. Maybe this is specific to my environment, but I won't say for sure so far.

I find one pattern about this issue -- for directories that is inaccessible, there appears to be at least one entry that server denied based on permission setup. Here comes the demonstration:

For accessible directories, all entries inside are at least always readable.

[root@freenas] /mnt/data0/Media# ls -l Musics/ total 80 drwxrwsr-x 2 firemeteor linux-staff 3 Jun 11 2013 $RECYCLE.BIN/ drwxrwsr-x+ 7 root linux-staff 8 Jul 3 2013 ./ drwxr-sr-x+ 7 root linux-staff 7 Feb 16 21:15 ../ drwxrwxr-x 2 root linux-staff 24 Apr 3 2009 Le Peuple migrateur/ drwxrwsr-x 2 firemeteor linux-staff 13 Jun 12 2013 Sample Music/ drwxrwxr-x 7 root linux-staff 7 Dec 24 2005 Secret Garden/ -rw-rw-r-- 1 firemeteor linux-staff 380 Jun 12 2013 desktop.ini drwxrwxr-x 2 root linux-staff 5 May 15 2011 ????????? - ???????????????/ [root@freenas] /mnt/data0/Media# ls -l Comics/ total 50 drwxr-sr-x+ 4 firemeteor linux-staff 4 Feb 16 21:38 ./ drwxr-sr-x+ 7 root linux-staff 7 Feb 16 21:15 ../ drwxrwxr-x 2 firemeteor linux-staff 12 Jul 31 2010 3x3 eyes/ drwxrwxr-x 2 firemeteor linux-staff 9 Aug 4 2009 ????????????????????????/

For failed ones, at least some of the cases are not readable.

[root@freenas] /mnt/data0/Media# ls -l Videos/ total 125 drwxrwsr-x 2 firemeteor linux-staff 3 Jun 11 2013 $RECYCLE.BIN/ drwxrwsr-x+ 7 root linux-staff 9 Feb 10 23:31 ./ drwxr-sr-x+ 7 root linux-staff 7 Feb 16 21:15 ../ -rw-rw-r-- 1 firemeteor linux-staff 10240 Jun 11 2013 Thumbs.db -rw-rw-r-- 1 firemeteor linux-staff 78 Jun 11 2013 desktop.ini drwxrwsr-x+ 4 root linux-staff 4 Feb 25 21:05 movies/ drwxrwsr-x+ 8 root linux-staff 8 Mar 9 14:58 series/ drwxrwxr-x 2 firemeteor linux-staff 3 Sep 16 20:20 ??????/ d---rws---+ 2 nobody linux-staff 44 Feb 16 20:52 ?????????????????????/ [root@freenas] /mnt/data0/Media# ls -l Photos/ total 64 drwxrwsr-x 2 firemeteor linux-staff 3 Jun 11 2013 $RECYCLE.BIN/ drwxrwsr-x+ 7 root linux-staff 8 Jan 12 21:09 ./ drwxr-sr-x+ 7 root linux-staff 7 Feb 16 21:15 ../ drwxrwsr-x 13 nobody linux-staff 13 Nov 2 23:22 2013/ drwxrwsr-x 2 nobody linux-staff 116 Mar 3 22:01 2013-9-29??????/ d---rws---+ 2 nobody linux-staff 321 Feb 24 20:20 2014-1-12iphone/ -rw-rw-r-- 1 firemeteor linux-staff 380 Jun 12 2013 desktop.ini d---rws---+ 3 nobody linux-staff 3 Jan 12 21:28 dour iphone sync/

I don't expect the entire directory been blocked for a single inaccessible entry inside. This appears to be fault for libnfs, but I can't rule-out the server yet. This is because I spotted one malformed reply to one READDIRPLUS request. I'm not sure if this is an whireshark decoding issue. I have the raw data, but my current net environment has blocked most of file hosting sites.

I can find a way to upload the raw data later, if you think it's important.

Please upload the whole binary/raw trace so I can look at it, it is a bit painful to read the tshark aascii output. Or you can email it straight to me if you prefer. The malformed frame I bet is a wireshark bug, but if I get the binary/raw trace I can fix it (I did most of the NFS stuff in wireshark too so it is still my bug :-) )

It looks like a permissions issue on your server as well as a libnfs issue.

There are some hints about unusualness in your trace. The READDIRPLUS call does fail with access denied but then instead if failing the READDIRPLUS outright, libnfs then falls back to trying to do READDIR followed by a loop to do LOOKUP for each individual object to find the file attributes.

This fallback mode is originally intended for the case where the server does not implement READDIRPLUS at all. Some userspace/embedded server skip this procedure and there I have to fallback to the simpler READDIR followed by a LOOKUP loop.

I have not tested it for the case where READDIRPLUS fails due to a permissions issue. Since READDIRPLUS on the server will internally perform a LOOKUP/GETATTR (==stat()) on each directory entry becasue it is needed in the reply, this means that if the user is not allowed to stat() one or more entries in that directory, then the whole READDIRPLUS operation will fail with an error. (It is 'weird' yes. reading the directory fails if you can not stat() the content of the directory, but that is READDIRPLUS for ya)

Then when I fall back to READDIR, which internally ONLY scans the names of the objects int he directory, the READDIR is successful but since we still to stat() the files, in order to see what is files, what is subdirectories etc, the READDIR is followed by a LOOKUP loop inside libnfs. This invokes LOOKUP for each object and which internally on the server invokes stat() so one or more of them would fail.

I am pretty certain this is what happens, but I would really need to see the trace to confirm.

For the question about which credentials/uid/gid does xbmc/libnfs use ? Under unix, libnfs will default to use the same uid/gid of the current user/process. Android does not have uid/gid in the traditional sense so I dont know exactly what it would send. My best guess is it would send uid==65524 gid==65534. You can check this though in your trace if you take one of the requests that android sends and expand the RPC layer and the Credentials subexpansion. The Credentials subexpansion will contain the uid/gid that is it uses.

There is an API in libnfs to set the credentials to whatever uid/gid you want, but I dont know if XBMC uses this.

Next steps would be, I think :

regards ronnie sahlberg

sahlberg commented 10 years ago

Android does not use uid/gid by default.

Libnfs defaults to using setting the credentials to the same uid/gid as the current process. On android I dont know what happens sicne uid/gid are random for each process.

There are APIs in libnfs to set which/whatever credentials you want for libnfs, but I am not sure whether xbmc on android uses it or not.

Open the credentials expansion under the RPC layer for any of the requests that android sent to the server. This subexpansion will show what uid/gid it is using.

I bet it is using nobody/nobody 65534/65534

On Tue, Mar 11, 2014 at 10:16 PM, firemeteorxx notifications@github.comwrote:

My permission setup is to ensure group access through nfsv4ACL to any newly created entry and use sticky bit to enforce the owning group.

This works great until I introduced the android NFS client. I'm not sure what UID / GID is used in this case?

Reply to this email directly or view it on GitHubhttps://github.com/sahlberg/libnfs/issues/60#issuecomment-37376378 .

sahlberg commented 10 years ago

On Wed, Mar 12, 2014 at 6:36 AM, ronnie sahlberg ronniesahlberg@gmail.com wrote:

On Tue, Mar 11, 2014 at 10:08 PM, firemeteorxx notifications@github.com wrote:

Here is the requested traffic capture: http://pastebin.com/HTZHPVtu I narrow it down to the failed session on 'Videos' sub-dir. It turns out that this has something to do with the permission setup. And I can't rule out the suspicion on both the server && client. Maybe this is specific to my environment, but I won't say for sure so far.

I find one pattern about this issue -- for directories that is inaccessible, there appears to be at least one entry that server denied based on permission setup. Here comes the demonstration:

For accessible directories, all entries inside are at least always readable.

[root@freenas] /mnt/data0/Media# ls -l Musics/ total 80 drwxrwsr-x 2 firemeteor linux-staff 3 Jun 11 2013 $RECYCLE.BIN/ drwxrwsr-x+ 7 root linux-staff 8 Jul 3 2013 ./ drwxr-sr-x+ 7 root linux-staff 7 Feb 16 21:15 ../ drwxrwxr-x 2 root linux-staff 24 Apr 3 2009 Le Peuple migrateur/ drwxrwsr-x 2 firemeteor linux-staff 13 Jun 12 2013 Sample Music/ drwxrwxr-x 7 root linux-staff 7 Dec 24 2005 Secret Garden/ -rw-rw-r-- 1 firemeteor linux-staff 380 Jun 12 2013 desktop.ini drwxrwxr-x 2 root linux-staff 5 May 15 2011 ????????? - ???????????????/ [root@freenas] /mnt/data0/Media# ls -l Comics/ total 50 drwxr-sr-x+ 4 firemeteor linux-staff 4 Feb 16 21:38 ./ drwxr-sr-x+ 7 root linux-staff 7 Feb 16 21:15 ../ drwxrwxr-x 2 firemeteor linux-staff 12 Jul 31 2010 3x3 eyes/ drwxrwxr-x 2 firemeteor linux-staff 9 Aug 4 2009 ????????????????????????/

For failed ones, at least some of the cases are not readable.

[root@freenas] /mnt/data0/Media# ls -l Videos/ total 125 drwxrwsr-x 2 firemeteor linux-staff 3 Jun 11 2013 $RECYCLE.BIN/ drwxrwsr-x+ 7 root linux-staff 9 Feb 10 23:31 ./ drwxr-sr-x+ 7 root linux-staff 7 Feb 16 21:15 ../ -rw-rw-r-- 1 firemeteor linux-staff 10240 Jun 11 2013 Thumbs.db -rw-rw-r-- 1 firemeteor linux-staff 78 Jun 11 2013 desktop.ini drwxrwsr-x+ 4 root linux-staff 4 Feb 25 21:05 movies/ drwxrwsr-x+ 8 root linux-staff 8 Mar 9 14:58 series/ drwxrwxr-x 2 firemeteor linux-staff 3 Sep 16 20:20 ??????/ d---rws---+ 2 nobody linux-staff 44 Feb 16 20:52 ?????????????????????/ [root@freenas] /mnt/data0/Media# ls -l Photos/ total 64 drwxrwsr-x 2 firemeteor linux-staff 3 Jun 11 2013 $RECYCLE.BIN/ drwxrwsr-x+ 7 root linux-staff 8 Jan 12 21:09 ./ drwxr-sr-x+ 7 root linux-staff 7 Feb 16 21:15 ../ drwxrwsr-x 13 nobody linux-staff 13 Nov 2 23:22 2013/ drwxrwsr-x 2 nobody linux-staff 116 Mar 3 22:01 2013-9-29??????/ d---rws---+ 2 nobody linux-staff 321 Feb 24 20:20 2014-1-12iphone/ -rw-rw-r-- 1 firemeteor linux-staff 380 Jun 12 2013 desktop.ini d---rws---+ 3 nobody linux-staff 3 Jan 12 21:28 dour iphone sync/

I don't expect the entire directory been blocked for a single inaccessible entry inside. This appears to be fault for libnfs, but I can't rule-out the server yet. This is because I spotted one malformed reply to one READDIRPLUS request. I'm not sure if this is an whireshark decoding issue. I have the raw data, but my current net environment has blocked most of file hosting sites.

I can find a way to upload the raw data later, if you think it's important.

Please upload the whole binary/raw trace so I can look at it, it is a bit painful to read the tshark aascii output. Or you can email it straight to me if you prefer. The malformed frame I bet is a wireshark bug, but if I get the binary/raw trace I can fix it (I did most of the NFS stuff in wireshark too so it is still my bug :-) )

It looks like a permissions issue on your server as well as a libnfs issue.

There are some hints about unusualness in your trace. The READDIRPLUS call does fail with access denied but then instead if failing the READDIRPLUS outright, libnfs then falls back to trying to do READDIR followed by a loop to do LOOKUP for each individual object to find the file attributes.

This fallback mode is originally intended for the case where the server does not implement READDIRPLUS at all. Some userspace/embedded server skip this procedure and there I have to fallback to the simpler READDIR followed by a LOOKUP loop.

I have not tested it for the case where READDIRPLUS fails due to a permissions issue. Since READDIRPLUS on the server will internally perform a LOOKUP/GETATTR (==stat()) on each directory entry becasue it is needed in the reply, this means that if the user is not allowed to stat() one or more entries in that directory, then the whole READDIRPLUS operation will fail with an error. (It is 'weird' yes. reading the directory fails if you can not stat() the content of the directory, but that is READDIRPLUS for ya)

Then when I fall back to READDIR, which internally ONLY scans the names of the objects int he directory, the READDIR is successful but since we still to stat() the files, in order to see what is files, what is subdirectories etc, the READDIR is followed by a LOOKUP loop inside libnfs. This invokes LOOKUP for each object and which internally on the server invokes stat() so one or more of them would fail.

I am pretty certain this is what happens, but I would really need to see the trace to confirm.

For the question about which credentials/uid/gid does xbmc/libnfs use ? Under unix, libnfs will default to use the same uid/gid of the current user/process. Android does not have uid/gid in the traditional sense so I dont know exactly what it would send. My best guess is it would send uid==65524 gid==65534. You can check this though in your trace if you take one of the requests that android sends and expand the RPC layer and the Credentials subexpansion. The Credentials subexpansion will contain the uid/gid that is it uses.

There is an API in libnfs to set the credentials to whatever uid/gid you want, but I dont know if XBMC uses this.

Next steps would be, I think :

  • I would like the binary trace to confirm.
  • Check what uid/gid you are using and see if tweaking the permissions on the server will resolve the issue (I am certain it will)
  • Libnfs READDIRPLUS emulation needs to be fixed. If the READDIR + LOOKUP loop fails the lookup for one specific component, we should not fail the READDIR operation. Just let the READDIR complete but leave the components that can not be stat()ed with some empty stat data.

regards ronnie sahlberg

Also need:

firemeteorxx commented 10 years ago

I've send you the raw trace through mail. And the UID/GID used in android xbmc appears to be the per-app allocated UID. So this will be dynamic and it appears that I need to make it world readable otherwise it won't reliably work.

As the behavior of READDIRPLUS, I think it's also questionable. Maybe the server should do something similar to your proposed fix internally.

sahlberg commented 10 years ago

I have checked in a change so that we will not fail the opendir() if this happens. This means you should be able to browse the directory from XBMC once this change is available in the android build for libnfs. I have also changed libnfs to set/default uid/gid to 65534 for android, which means you will get a stable uid/gid instead of having to open up the permissions for everyone. Again, this is unknown when it will be available in the android build, so it is not a solution for you right now :-( . But you should be able to workaround it by permission changes on your server.

For now you probably need to open up the permissions for all users. If you use NFSv4 ACLs, it is possible that you are denying READ-ATTRIBUTES for these files? It might be sufficient if you go through all your ACLs and make sure you yiou are not denying READ-ATTRIBUTES for any file.

As for the behaviour of READDIRPLUS in this regard, I agree. It is definitely 'unintuitive' that permissions on the files in a directory should cause the normal NFS read directory command to fail.

sahlberg commented 10 years ago

Closing issue.