sahlberg / libnfs

NFS client library
Other
524 stars 201 forks source link

Unable to open large files over NFS #46

Closed DeJuanNOnley closed 10 years ago

DeJuanNOnley commented 10 years ago

The client is Windows 8.1 Pro 64-bit. The issues occur with XBMC (latest nightly builds using libnfs 1.6 I believe, which is 32-bit only under Windows if that helps). The users there told me to check here as they are unable to track down the problem. Host system is Ubuntu saucy 64-bit (though I have tried other distros as well). I am able to replicate the problem nearly 100% of the time. Smaller files (around 2GB or less) open fine. But larger ones generate the following error:

ERROR: XFILE::CNFSFile::Read - Error( -22, pread call failed with "NFS: Read failed with NFS3ERR_INVAL(-22)" )

Other users have reported the same problem under similar configurations. I was told to post the Wireshark logs here, which is impossible due to file size and type limitations. Please let me know how I can get them to you if they would be useful.

In short, Wireshark shows a bunch of lines as follows:

192.168.1.10 192.168.1.100 TCP 1514 [TCP segment of a reassembled PDU] 192.168.1.10 192.168.1.100 TCP 1514 [TCP segment of a reassembled PDU] 192.168.1.100 192.168.1.10 TCP 54 811 > nfs [ACK] Seq=4557 Ack=857773 Win=97792 Len=0 192.168.1.10 192.168.1.100 NFS 1482 V3 READ Reply (Call In 1060) Len: 65536

With about one of these per second:

192.168.1.10 192.168.1.100 TCP 1482 [TCP Retransmission] nfs > 811 [PSH, ACK] Seq=857773 Ack=4557 Win=38912 Len=1428[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)]

Where 192.168.1.10 is the NFS server (and non-Windows systems not using libnfs are able to open these files fine) and 192.168.1.100 is the Windows 8.1 client.

Memphiz commented 10 years ago

I wanted to add that it works with the same version on Windows 7 with same server based on what DeJuanNOnley wrote in our forum ;)

sahlberg commented 10 years ago

...NFS3ERR_INVAL(-22)...

So this looks like the server returning an error back to libnfs, meaning there would be something wrong with the request we sent to the server.

Can you send a trimmed down wireshark log so I can see exactly what is sent to the server?

I don't know how familiar you are with wireshark, so I will provide some pointers on how you can identify the area of the area we are interested in as well as how to cut out and save a subset of the trace.

First, we nee to find the packets where the server returns this error. You should be able to find this by applying this filter to the trace (just type it in in the filter box and click apply)

nfs.status !=0

Which will list all NFS errors. There should be very few of these and one at the end of the trace should show something like V3 READ Reply ... Error: NFS3ERR_INVALID or something like that.

Make a note of the packet number for that packet. And change the display filter to instead be something like

frame.number > foo && frame.number < bar

where foo is 1000 less than the frame number of the error that happened and bar is 100 higher than the error. This then will show us the 1000 packets leading up to the error as well as the next 100 packets after the error. Then go to "File/ExportSpecificPackets" and make save. On the dialog box, make sure that "DISPLAYED" is selected, NOT "CAPTURED". This will save a new file containing onlty the packets you selected in the display filter above, and this should be a reasonably small capture file, only containing 1100 packets, which you should be able to post.

regards ronnie sahlberg

On Thu, Jan 9, 2014 at 6:25 AM, Memphiz notifications@github.com wrote:

I wanted to add that it works with the same version on Windows 7 with same server based on what DeJuanNOnley wrote in our forum ;)

— Reply to this email directly or view it on GitHubhttps://github.com/sahlberg/libnfs/issues/46#issuecomment-31935407 .

DeJuanNOnley commented 10 years ago

I got the info you requested. I uploaded to my Google Drive: https://drive.google.com/file/d/0B4vT5vOd7VOKV1c0Tnk0eExSNXM/edit?usp=sharing

Not sure exactly what is causing this, but I have learned that certain options (specifically caching) in XBMC are causing the error to occur far more often. So it may or may not be a libnfs issue.

sahlberg commented 10 years ago

Ok,

The problem is packet 1043 where we issue a read for offset 0xffffffff9c248201 which results in invalid.

This looks like a sign conversion error, like expanding a signed int32 to a int64 while the offset should always be unsigned.

You are calling the functions nfs_read() or nfs_pread() from xbmc? (I assume nfs_pread() since the offsets are not in sequence)

I think these functions should be uint64 call the way from these entry points to what goes on the wire, but I need to double check.

Can you ask the xbmc folks if it could be possible that the offset argument from xmbc when it calls nfs_pread() whether it would be possible that an integer expansion from a singed 32 bit integer to a 64 bit integer could happen? I.e. just checking the place where nfs_pread() is called from the wrapper.

At this stage, it looks like there is a integer expansion bug, either in xbmc or in libnfs. I will check libnfs more thoroughly tonight.

On Fri, Jan 10, 2014 at 9:38 AM, DeJuanNOnley notifications@github.comwrote:

I got the info you requested. I uploaded to my Google Drive:

https://drive.google.com/file/d/0B4vT5vOd7VOKV1c0Tnk0eExSNXM/edit?usp=sharing

Not sure exactly what is causing this, but I have learned that certain options (specifically caching) in XBMC are causing the error to occur far more often. So it may or may not be a libnfs issue.

— Reply to this email directly or view it on GitHubhttps://github.com/sahlberg/libnfs/issues/46#issuecomment-32048264 .

Memphiz commented 10 years ago

I am reading you ronnie :)

We are calling nfs_read in 2 places. One is the keep alive (which we could remove at some point i guess - not sure why i left that code in though) and in the generic read method.

see

https://github.com/xbmc/xbmc/blob/master/xbmc/filesystem/NFSFile.cpp#L429

and

https://github.com/xbmc/xbmc/blob/master/xbmc/filesystem/NFSFile.cpp#L639

sahlberg commented 10 years ago

Ah, so you are using nfs_read() + nfs_lseek() instead of nfs_pread()

That means it has to be related to nfs_lseek() and in particular, where it is called from line 663 https://github.com/xbmc/xbmc/blob/master/xbmc/filesystem/NFSFile.cpp#L663

I will check nfs_lseek() in more detail tonight.

Memphiz commented 10 years ago

https://github.com/xbmc/xbmc/blob/master/xbmc/filesystem/NFSFile.cpp#L654 - DOH ... we are passing a signed int64 here ... this is a api design thingy in XBMC and i stumbled over this already ... might be XBMC ... but then again we would have the problem on other VFS implementations (samba) aswell maybe (cause the interface has int64 there for all of them).

Yeah take your time :) - appreciate your quick response everytime I screw something up ;)

sahlberg commented 10 years ago

On Fri, Jan 10, 2014 at 10:24 AM, Memphiz notifications@github.com wrote:

https://github.com/xbmc/xbmc/blob/master/xbmc/filesystem/NFSFile.cpp#L654- DOH ... we are passing a signed int64 here ... this is a api design thingy in XBMC and i stumbled over this already ... might be XBMC ... but then again we would have the problem on other VFS implementations (samba) aswell maybe (cause the interface has int64 there for all of them).

An int64 itself is not a problem, but if there is a place where a int32 is converted to a int64 (and thus extending the top/sign bit of the int32 so that all the top 32 bits of the int64 becomes one too)

So this could happen if this function: int64_t CNFSFile::Seek(int64_t iFilePosition, int iWhence)

is called and you pass it a int32 as the first argument. Then the int32 value 0x9c248201 would be expanded to become the int64 value 0xffffffff9c248201 that we see on the wire.

I.e. is there a place where you call CNFSFile::Seek and pass a 32 bit integer as the first argument (and depend on the compiler to expand it to int64) ?

Yeah take your time :) - appreciate your quick response everytime I screw something up ;)

— Reply to this email directly or view it on GitHubhttps://github.com/sahlberg/libnfs/issues/46#issuecomment-32051946 .

sahlberg commented 10 years ago

One additional datapoint.

The actual value we are trying to read from seems related to the file size. The file size is 0x000000029c248202

but the read is from 0xffffffff9c248201

Which mean that if we are treating the file size as a uint32 and then try to read the very last byte of the file it would expand to the value we see on the wire. I.e. the offset we read from seems to be dependent on the size of the file rather than any bytes in the content of the file.

I will check lseek handling as well as all places where libnfs is using/returning file size to applications.

On Fri, Jan 10, 2014 at 10:32 AM, ronnie sahlberg ronniesahlberg@gmail.comwrote:

On Fri, Jan 10, 2014 at 10:24 AM, Memphiz notifications@github.comwrote:

https://github.com/xbmc/xbmc/blob/master/xbmc/filesystem/NFSFile.cpp#L654- DOH ... we are passing a signed int64 here ... this is a api design thingy in XBMC and i stumbled over this already ... might be XBMC ... but then again we would have the problem on other VFS implementations (samba) aswell maybe (cause the interface has int64 there for all of them).

An int64 itself is not a problem, but if there is a place where a int32 is converted to a int64 (and thus extending the top/sign bit of the int32 so that all the top 32 bits of the int64 becomes one too)

So this could happen if this function: int64_t CNFSFile::Seek(int64_t iFilePosition, int iWhence)

is called and you pass it a int32 as the first argument. Then the int32 value 0x9c248201 would be expanded to become the int64 value 0xffffffff9c248201 that we see on the wire.

I.e. is there a place where you call CNFSFile::Seek and pass a 32 bit integer as the first argument (and depend on the compiler to expand it to int64) ?

Yeah take your time :) - appreciate your quick response everytime I screw something up ;)

— Reply to this email directly or view it on GitHubhttps://github.com/sahlberg/libnfs/issues/46#issuecomment-32051946 .

sahlberg commented 10 years ago

On Fri, Jan 10, 2014 at 10:24 AM, Memphiz notifications@github.com wrote:

... but then again we would have the problem on other VFS implementations (samba) aswell maybe (cause the interface has int64 there for all of them).

Not necessarily. NFS is differs a little in this case.

In posix, and most filesystems, reading from end of file is not an error. It will just return that read() returned 0 bytes read.

In NFS reading beyond the end of file IS considered an error and the nfs server will return an error.

That is proably why you only see the issue with libnfs.

sahlberg commented 10 years ago

Thinking more about this, this is what could be happening.

Notice that the file size is 0x000000029c248202 and we try to read from 0xffffffff9c248201

The 32 low order are all the same exscept from we read from one byte less. So this looks like some sort of "read the last byte of the file" going wrong.

Do you know what your off_t is ?

This is what I think could explain this : You have some code path where it tries to read the last byte of the file.

The file size is 0x000000029c248202 but if off_t is only 32 bits, then you will end up with a stat structure where st_size will be just 0x9c248202

Then somewhere later you take st.st_size, subtract 1 to get "last byte of the file" and end up with st.st_size - 1 == 0x9c248201

Then this value is passed in to the ::Seek function, but since Seek takes a int64, the value 0x9c248201 is sign extended and becomes 0xffffffff9c248201 and which is what we see on the wire.

This is my current hypothesis. Read from last byte of the file ring any bells ? off_t might be 32 bits ring any bells ?

On Fri, Jan 10, 2014 at 11:59 AM, ronnie sahlberg ronniesahlberg@gmail.com wrote:

On Fri, Jan 10, 2014 at 10:24 AM, Memphiz notifications@github.com wrote:

... but then again we would have the problem on other VFS implementations (samba) aswell maybe (cause the interface has int64 there for all of them).

Not necessarily. NFS is differs a little in this case.

In posix, and most filesystems, reading from end of file is not an error. It will just return that read() returned 0 bytes read.

In NFS reading beyond the end of file IS considered an error and the nfs server will return an error.

That is proably why you only see the issue with libnfs.

sahlberg commented 10 years ago

From what I can find about windows, on windows you use stat for 32bit data but you use stat64 for 64 bit data. (File.cpp uses a Stat() that takes a stat64 pointer.

So if this is the case, I think what you need is something like a :

1, In libnfs.c / nfs_stat_1_cb change struct stat st to struct __stat64 st and do the same change inside an ifdef in libnfs-sync.c

2, in xbmc NFSFile.cpp change the Stat function to use a __stat64 instead of a stat structure.

I can not compile so I can not test this hypothesis. But it could explain this.

On Fri, Jan 10, 2014 at 3:30 PM, ronnie sahlberg ronniesahlberg@gmail.com wrote:

Thinking more about this, this is what could be happening.

Notice that the file size is 0x000000029c248202 and we try to read from 0xffffffff9c248201

The 32 low order are all the same exscept from we read from one byte less. So this looks like some sort of "read the last byte of the file" going wrong.

Do you know what your off_t is ?

This is what I think could explain this : You have some code path where it tries to read the last byte of the file.

The file size is 0x000000029c248202 but if off_t is only 32 bits, then you will end up with a stat structure where st_size will be just 0x9c248202

Then somewhere later you take st.st_size, subtract 1 to get "last byte of the file" and end up with st.st_size - 1 == 0x9c248201

Then this value is passed in to the ::Seek function, but since Seek takes a int64, the value 0x9c248201 is sign extended and becomes 0xffffffff9c248201 and which is what we see on the wire.

This is my current hypothesis. Read from last byte of the file ring any bells ? off_t might be 32 bits ring any bells ?

On Fri, Jan 10, 2014 at 11:59 AM, ronnie sahlberg ronniesahlberg@gmail.com wrote:

On Fri, Jan 10, 2014 at 10:24 AM, Memphiz notifications@github.com wrote:

... but then again we would have the problem on other VFS implementations (samba) aswell maybe (cause the interface has int64 there for all of them).

Not necessarily. NFS is differs a little in this case.

In posix, and most filesystems, reading from end of file is not an error. It will just return that read() returned 0 bytes read.

In NFS reading beyond the end of file IS considered an error and the nfs server will return an error.

That is proably why you only see the issue with libnfs.

Memphiz commented 10 years ago

I just wonder why this happens only on windows 8 and not windows 7 ... i try those changes (but not sure if i got you right in complete :) ).

Memphiz commented 10 years ago

do you mean something like that?

http://pastebin.com/BM4gTBBB

sahlberg commented 10 years ago

On Sat, Jan 11, 2014 at 11:48 AM, Memphiz notifications@github.com wrote:

do you mean something like that?

http://pastebin.com/BM4gTBBB

Yeah, something like that. (and with an ifdef for the change in libnfs.c too)

This should make things better. But there is no guarantee it will fix this issue.

I am not a windows developer, but what I can read by googling, it looks like on windows you have two different structures and two different functions for 32 bit stat size and 64 bit. (as opposed to on posix systems where you control it via _FILE_OFFSET_BITS=64 but the function name and the structure name remains the same).

Reply to this email directly or view it on GitHubhttps://github.com/sahlberg/libnfs/issues/46#issuecomment-32105128 .

Memphiz commented 10 years ago

@DeJuanNOnley please try this version

http://mirrors.xbmc.org/test-builds/win32/XBMCSetup-20140111-a19b581-win32nfsstat64.exe

and after installation exchange libnfs in system\libnfs.dll (or system\nfs\libnfs.dll - not sure) with this one here:

https://dl.dropboxusercontent.com/u/30371861/libnfs-stat64.dll

sahlberg commented 10 years ago

Also, do you have a different file that also fails? And if you could try with it too, and collect the same type of wireshark trace?

Would be interesting to see if that too fails at trying to read at offset "last byte of the file, and top 32 bits all set to 1".

On Sat, Jan 11, 2014 at 1:31 PM, Memphiz notifications@github.com wrote:

@DeJuanNOnley https://github.com/DeJuanNOnley please try this version

http://mirrors.xbmc.org/test-builds/win32/XBMCSetup-20140111-a19b581-win32nfsstat64.exe

and after installation exchange libnfs in system\libnfs.dll (or system\nfs\libnfs.dll - not sure) with this one here:

https://dl.dropboxusercontent.com/u/30371861/libnfs-stat64.dll

— Reply to this email directly or view it on GitHubhttps://github.com/sahlberg/libnfs/issues/46#issuecomment-32107760 .

DeJuanNOnley commented 10 years ago

@Memphiz I can't browse my NFS shares with that version. With either DLL. I get the following in the XBMC log:

ERROR: XFILE::CFileCache::Open - failed to open source <C:\Users\Dejuan\AppData\Roaming\XBMC\cache\10025-909842f7.fi> ERROR: XFILE::CFileCache::Open - failed to open source <special://temp/10025-909842f7.fi> ERROR: Unhandled exception in CApplication::FrameMove() : Access violation at 0x012950cb: Writing location 0x00000000

I can add them, but if I try to open them I get those errors. And I double-checked that the old version is still able to open them.

@sahlberg Yes, it happens with multiple files. Would you like me to get a trace from the old version or wait until we get this new one working?

Memphiz commented 10 years ago

@DeJuanNOnley could you try with this libnfs please - i have fatfingered something in there:

https://dl.dropboxusercontent.com/u/30371861/libnfs-stat64-2.dll

DeJuanNOnley commented 10 years ago

Am I doing something wrong? I still can't browse NFS shares with that updated DLL:

NOTICE: Thread LanguageInvoker start, auto delete: false NOTICE: -->Python Interpreter Initialized<-- NOTICE: Thread FileCache start, auto delete: false ERROR: XFILE::CFileCache::Open - failed to open source <C:\Users\Dejuan\AppData\Roaming\XBMC\userdata\addon_data\service.xbmc.versioncheck\settings.xml> ERROR: XFILE::CFileCache::Open - failed to open source <special://profile/addon_data/service.xbmc.versioncheck/settings.xml> ERROR: XFILE::CFileCache::Open - failed to open source <C:\Users\Dejuan\AppData\Roaming\XBMC\cache\10025-909842f7.fi> ERROR: XFILE::CFileCache::Open - failed to open source <special://temp/10025-909842f7.fi> ERROR: Unhandled exception in CApplication::FrameMove() : Access violation at 0x00f150cb: Writing location 0x00000000

I'm renaming these to XBMC\system\libnfs.dll and ensuring none of the other libnfs.dll's you've provided are in the XBMC directory (or sub-directories) at all.

Memphiz commented 10 years ago

There is not even access to nfs in your log snippet and i tried this version on windows7 64 and don't get those printouts. Seems you trigger a whole different bug here on windows8 ...

Memphiz commented 10 years ago

Can you move away your xbmc dir for a quick test? for ensuring the system is clean? Move C:\Users\Dejuan\AppData\Roaming\XBMC to C:\Users\Dejuan\AppData\Roaming\XBMC.orig or something

DeJuanNOnley commented 10 years ago

I had the same thought. You beat me to it. And yes, a new %appdata%\XBMC directory did fix that problem. Odd.

And YES! The new DLL has definitely fixed the problem. I had both the alpha 11 release and the one you linked to here installed in different directories just so I could test them side-by-side on the same system. And the old one continues to fail while the new one just worked on the same videos ten times in a row.

Memphiz commented 10 years ago

So you are saying for old XBMC builds the new lib also fixes the problem? Or do you need both - new lib and my test xbmc build?

DeJuanNOnley commented 10 years ago

Good question... testing now...

No. The old version of XBMC with the updated DLL CTDs every time I try to play a video over NFS.

Memphiz commented 10 years ago

Ok that makes sense. Thx for the testing - will fix this in mainline soon.

@sahlberg thx a bunch. My problem is now - if the changes in XBMC are platform independend (e.x. if using __stat64 there will break on other platforms - maybe i better ifdef those in XBMC aswell...)

sahlberg commented 10 years ago

It works, Good stuff.

On Sun, Jan 12, 2014 at 5:31 AM, Memphiz notifications@github.com wrote:

Ok that makes sense. Thx for the testing - will fix this in mainline soon.

@sahlberg https://github.com/sahlberg thx a bunch. My problem is now - if the changes in XBMC are platform independend (e.x. if using __stat64 there will break on other platforms - maybe i better ifdef those in XBMC aswell...)

That is a really good question. I think only windows use __stat64 since unixens control the size of st_size through the file offset bits macro.

That said, it does look like most of the other file handlers, like File.cpp already use stat64 unconditionally, so I guess there is already some kind of "remap stat64 to stat" for all non-windows compiles in xbmc. Probably best to check with someone that knows windows really well here.

Somewhat related, it looks like a few of the other file handlers also still use stat instead of __stat64. Maybe these are also vulnerable to the same issue. AFPFile.cpp for example.

— Reply to this email directly or view it on GitHubhttps://github.com/sahlberg/libnfs/issues/46#issuecomment-32122506 .

Memphiz commented 10 years ago

Yeah afpfile is a c&p from nfsfile - and it will be phased out soon. Its not available for windows anyway (afp-ng doesn't compile on windows and was never ported). Since we are in feature freeze i did it with ifdefery in XBMC for beeing sure to not influence other platforms - will remove that after release ;)

sahlberg commented 10 years ago

On Sun, Jan 12, 2014 at 8:27 AM, Memphiz notifications@github.com wrote:

Yeah afpfile is a c&p from nfsfile - and it will be phased out soon. Its not available for windows anyway (afp-ng doesn't compile on windows and was never ported). Since we are in feature freeze i did it with ifdefery in XBMC for beeing sure to not influence other platforms - will remove that after release ;)

Awesome.

I merged your patch to libnfs and will release a new version within the next few days.

Reply to this email directly or view it on GitHubhttps://github.com/sahlberg/libnfs/issues/46#issuecomment-32126261 .

sahlberg commented 10 years ago

And to DeJuanNOnley, thanks a lot!

Your wireshark trace showing the offsets for the reads that caused the failures really helped nail down exactly what the issue was. Good stuff!

regards ronnie sahlberg

On Sun, Jan 12, 2014 at 3:51 PM, ronnie sahlberg ronniesahlberg@gmail.comwrote:

On Sun, Jan 12, 2014 at 8:27 AM, Memphiz notifications@github.com wrote:

Yeah afpfile is a c&p from nfsfile - and it will be phased out soon. Its not available for windows anyway (afp-ng doesn't compile on windows and was never ported). Since we are in feature freeze i did it with ifdefery in XBMC for beeing sure to not influence other platforms - will remove that after release ;)

Awesome.

I merged your patch to libnfs and will release a new version within the next few days.

Reply to this email directly or view it on GitHubhttps://github.com/sahlberg/libnfs/issues/46#issuecomment-32126261 .

Memphiz commented 10 years ago

Hold off the Release for now if possible - iirc current Master does not compile for Windows - i will Hope to have some time to figure that out and do another pr.

For xbmc we still use 1.6 with 2 Patches on Top - do no hurry from our side ;)

DeJuanNOnley commented 10 years ago

@sahlberg You're welcome. Thanks for being patient with me.

@Memphiz Probably unrelated, but I just realized that in this release SSH/SFTP causes CTD. Just an FYI.

Memphiz commented 10 years ago

@DeJuanNOnley we should move back to our forum now. Please try with the todays nightly build and if issues are there - post a new thread into windows section (thanks to god ssh/sftp is not my area ;) ).