sahlberg / libnfs

NFS client library
Other
511 stars 200 forks source link

Support UDP and larger block sizes #108

Closed popcornmix closed 3 years ago

popcornmix commented 9 years ago

With a limited performance device like the Raspberry Pi, where playing a raw bluay is only just possible, it's important that the nfs throughput is as high as possible.

There is a lot of evidence that using an OS NFS mount makes a lot more streams playable on the Pi compared to libnfs. E.g. http://forum.kodi.tv/showthread.php?tid=154279 http://forum.osmc.tv/showthread.php?tid=6825

Something like:

mount 192.168.4.9:/Public -o nfsvers=3,rw,intr,noatime,rsize=32768,wsize=32768,nolock,async,proto=udp /storage/mnt

works much better. I think UDP is the most significant change, but being able to configure larger block sizes also helps.

Would this be possible?

sahlberg commented 9 years ago

On Wed, Feb 4, 2015 at 5:37 AM, popcornmix notifications@github.com wrote:

With a limited performance device like the Raspberry Pi, where playing a raw bluay is only just possible, it's important that the nfs throughput is as high as possible.

There is a lot of evidence that using an OS NFS mount makes a lot more streams playable on the Pi compared to libnfs. E.g. http://forum.kodi.tv/showthread.php?tid=154279 http://forum.osmc.tv/showthread.php?tid=6825

Something like:

mount 192.168.4.9:/Public -o nfsvers=3,rw,intr,noatime,rsize=32768,wsize=32768,nolock,async,proto=udp /storage/mnt

works much better. I think UDP is the most significant change, but being able to configure larger block sizes also helps.

Would this be possible?

i am not certain. UDP had a somewhat noticeable impact in the SUN/3 days since it allowed disabling checksumming, which improved performance a little in the day of 14MHz CPUs, but the RPi is not THAT slow :-) The big problem with UDP is that it can not handle network packetloss nearly as well as TCP, meaning that if the network is not perfect, TCP might slow down but UDP quickly become impossible. So I would really like to see real hard numbers on this first...

The reason why I think XBMC/KODI is slow with libnfs is something different I think. KODI/XBMC seems to do I/O in single threaded and synchronous fashion. This will never perform well unless you have significant readahead buffers so that you can serve the data out of client readahead cache instead of reading it off the server every time. The kernel NFS server has lots of readahead cache, basically all unused memory can be used for this.

Libnfs itself is fast. It is relatively easy to READ saturate a GbE link with a single thread and using the async API. But that is a completely different programming model, and it might be unfeasible to rewrite XBMC block layer to be asynchronous :-( so we need a different solution.

Please see the screenshot I attached, it shows the singlethreaded behaviour : This shows what XBMC does when streaming a movie.

The IO-Graphs window shows the "queue depth" of the NFS calls. From this we see that XBMC never does more than one I/O at a time and it does it in a single threaded fashion.

Since this is simple sequential reading, it should be fairly trivial to fix this. By default, libnfs will not do much/any readahead or caching. You can change in the code by calling :

rpc_set_readahead(nfs_get_rpc_context(nfs),

"some-large-value-in-bytes");

Before you starting to stream from the file. It will tell libnfs to allow up to that many bytes as readahead buffer for each open filedescriptor. Since xbmc only reads few files at a time, maybe you can set this to 1MByte, maybe 10MByte? Try and see how it goes.

(This was added for the qemu/kvm folks. They really want to be able to stream enormous image files at wire rate from a single thread)

regards ronnie sahlberg

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108.

sahlberg commented 9 years ago

Here is a better screenshot for the io graph to show the issue :

On Wed, Feb 4, 2015 at 6:33 AM, ronnie sahlberg ronniesahlberg@gmail.com wrote:

On Wed, Feb 4, 2015 at 5:37 AM, popcornmix notifications@github.com wrote:

With a limited performance device like the Raspberry Pi, where playing a raw bluay is only just possible, it's important that the nfs throughput is as high as possible.

There is a lot of evidence that using an OS NFS mount makes a lot more streams playable on the Pi compared to libnfs. E.g. http://forum.kodi.tv/showthread.php?tid=154279 http://forum.osmc.tv/showthread.php?tid=6825

Something like:

mount 192.168.4.9:/Public -o nfsvers=3,rw,intr,noatime,rsize=32768,wsize=32768,nolock,async,proto=udp /storage/mnt

works much better. I think UDP is the most significant change, but being able to configure larger block sizes also helps.

Would this be possible?

i am not certain. UDP had a somewhat noticeable impact in the SUN/3 days since it allowed disabling checksumming, which improved performance a little in the day of 14MHz CPUs, but the RPi is not THAT slow :-) The big problem with UDP is that it can not handle network packetloss nearly as well as TCP, meaning that if the network is not perfect, TCP might slow down but UDP quickly become impossible. So I would really like to see real hard numbers on this first...

The reason why I think XBMC/KODI is slow with libnfs is something different I think. KODI/XBMC seems to do I/O in single threaded and synchronous fashion. This will never perform well unless you have significant readahead buffers so that you can serve the data out of client readahead cache instead of reading it off the server every time. The kernel NFS server has lots of readahead cache, basically all unused memory can be used for this.

Libnfs itself is fast. It is relatively easy to READ saturate a GbE link with a single thread and using the async API. But that is a completely different programming model, and it might be unfeasible to rewrite XBMC block layer to be asynchronous :-( so we need a different solution.

Please see the screenshot I attached, it shows the singlethreaded behaviour : This shows what XBMC does when streaming a movie.

The IO-Graphs window shows the "queue depth" of the NFS calls. From this we see that XBMC never does more than one I/O at a time and it does it in a single threaded fashion.

Since this is simple sequential reading, it should be fairly trivial to fix this. By default, libnfs will not do much/any readahead or caching. You can change in the code by calling :

rpc_set_readahead(nfs_get_rpc_context(nfs),

"some-large-value-in-bytes");

Before you starting to stream from the file. It will tell libnfs to allow up to that many bytes as readahead buffer for each open filedescriptor. Since xbmc only reads few files at a time, maybe you can set this to 1MByte, maybe 10MByte? Try and see how it goes.

(This was added for the qemu/kvm folks. They really want to be able to stream enormous image files at wire rate from a single thread)

regards ronnie sahlberg

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108.

popcornmix commented 9 years ago

Screenshot attachments aren't handled by github - do you have a link?

sahlberg commented 9 years ago

https://drive.google.com/?tab=mo&authuser=0#folders/0BxsNu_mgLFNEQktiRGgtTTBGTjg https://drive.google.com/file/d/0BxsNu_mgLFNEZkJrbmFhQVpBTGs/view?usp=sharing https://drive.google.com/file/d/0BxsNu_mgLFNEaU1XdmN3VndVdnM/view?usp=sharing

On Wed, Feb 4, 2015 at 6:36 AM, popcornmix notifications@github.com wrote:

Screenshot attachments aren't handled by github - do you have a link?

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108#issuecomment-72863931.

sahlberg commented 9 years ago

Since this looks like things are basically latency bound due to we only read 32kb at a time, single threaded and synchronously, this combined with the "not great" ethernet behind a slow and high latency usb link that the RPi has is likely the culprit in my opinion thus far.

Good news is that this should not be too hard to do something about. The RPC call to enable readhead should help a lot I think if you can add.

Let me guess, you have libsmbclient support too and that is also slow? Probably same reason, but harder to fix.

On Wed, Feb 4, 2015 at 6:41 AM, ronnie sahlberg ronniesahlberg@gmail.com wrote:

https://drive.google.com/?tab=mo&authuser=0#folders/0BxsNu_mgLFNEQktiRGgtTTBGTjg

https://drive.google.com/file/d/0BxsNu_mgLFNEZkJrbmFhQVpBTGs/view?usp=sharing

https://drive.google.com/file/d/0BxsNu_mgLFNEaU1XdmN3VndVdnM/view?usp=sharing

On Wed, Feb 4, 2015 at 6:36 AM, popcornmix notifications@github.com wrote:

Screenshot attachments aren't handled by github - do you have a link?

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108#issuecomment-72863931.

popcornmix commented 9 years ago

Is there a reason why OS NFS mounts work better (@fritsch experienced this yesterday with stuttering playback from libnfs and perfect playback from OS mount). Does kodi treat it differently, or does the OS mount effectively do the readahead itself?

popcornmix commented 9 years ago

Oh yes, libsmbclient is even worse...

sahlberg commented 9 years ago

The OS mount will use readahead.

The linux kernel will basically use any/all available memory as cache and that cache will be used for readahead for NFS files. Libnfs will by default not do readahead caching.

On Wed, Feb 4, 2015 at 6:47 AM, popcornmix notifications@github.com wrote:

Is there a reason why OS NFS mounts work better (@fritsch https://github.com/fritsch experienced this yesterday with stuttering playback from libnfs and perfect playback from OS mount). Does kodi treat it differently, or does the OS mount effectively do the readahead itself?

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108#issuecomment-72866160.

fritsch commented 9 years ago

@sahlberg: readbufferfactor via advancedsettings only increases the spikes, right?

I am wondering if forcing the cache method also for local files will change that beheviour.

e.g.

<advancedsettings>
<network>
<readbufferfactor>4.0</readbufferfactor>
<buffermode>1</buffermode>
</network>
</advancedsettings>
sahlberg commented 9 years ago

Looking through the libnfs readahead code. This code is aimed at mostly-sequential readers, but also to not impact non-sequential/random readers. It works well for the average kvm/qemu use case, where you want to benefit the occasional sequence of sequentiual reads but do not want to hurt the common random reads. It is also aimed at optimizing for throughput for the sequential sequences, not latency. (for example, even while serving from the readahead cache, it will not continously replenish the cache but instead wait until the whole cache is consumed before replenising it)

For KODI this is perhaps not ideal, and we might need a different strategy. I think if KODI activates the existing readahead in libnfs, things will probably improve but there will likely still be latency spikes that could affect playback on high-latency devices like the pi.

We need a slightly different strategy for how to replenish the cache here. For the pure sequential read case, we need a strategy where at the same time as we serve data from the head of the readahead cache we also try to replenish the tail.

This is not too hard to implement in libnfs but will take a bit of time. I will probably not have time to implement a pure sequential cache until this weekend but it should be doable and it should fix all the problems on the pi.

I will probably not make the sequential read strategy default in libnfs because there will be tradeoffs, such as making random reads possibly more expensive, so once I finish this new feature, we will need to make a small code change in kodi to activate it.

On the kodi side it will be a trivial code change, something like call the function nfs_set_sequential_mode(nfsfh) on the file immedialte after you have performed the nfs_open() call you do in your vfs module.

(I would need someone that can build kodi to help testing and confirming that this fixes the issue once I have updated libnfs.)

regards ronnie sahlberg

On Wed, Feb 4, 2015 at 6:52 AM, Peter Frühberger notifications@github.com wrote:

@sahlberg https://github.com/sahlberg: readbufferfactor via advancedsettings only increases the spikes, right?

I am wondering if forcing the cache method also for local files will change that beheviour.

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108#issuecomment-72867236.

popcornmix commented 9 years ago

@sahlberg I can test patches and confirm if it helps (e.g. by playing test files from http://jell.yfish.us/)

Memphiz commented 9 years ago

@popcornmix could you try this:

https://github.com/Memphiz/xbmc/commit/6f7b81b495831e0f4b8030730bc47d63c2cf3c6c

Memphiz commented 9 years ago

sorry - updated once more (had an issue with 32bit truncate with those added advancedsettings variables): https://github.com/Memphiz/xbmc/commit/c9841fb9bf00cc9162fb25da19caac49ed785002

Memphiz commented 9 years ago

@sahlberg odd - i just tried my own patch on osx with libnfs 1.9.6 - without readahead i get about 3.4MB/s in the kodi filemanager (just tried to copy a movie from my nas to my local hdd from within kodi). osx tells me 3.6MB/s inbound traffic - matches roughly.

When i enable 1MB readahead Kodi gives me around 480KB/s (!). OSX tells me 5.4MB/s inbound traffic. So while the inbound speed is increased indeed it doesn't seem to give benefits from within kodi. No clue what happens here.

Memphiz commented 9 years ago

the reads are done 128k each ... maybe each read invalidates the readahead cache or so?

sahlberg commented 9 years ago

Could well be broken for this use case. The current readahead was contributed and done pretty much for a specific qemu use case.

Let me create a different readahead implementation for the pure sequential use case that kodi uses over the weekend and we can see how it performs.

On Wed, Feb 4, 2015 at 12:59 PM, Memphiz notifications@github.com wrote:

the reads are done 128k each ... maybe each read invalidates the readahead cache or so?

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108#issuecomment-72937712.

fritsch commented 9 years ago

In the end, this helped a whole lot and basically made my use case working on the PI2: https://github.com/fritsch/xbmc/commit/79d7edd27fb8e36509d44fb8c9def590d3964952 on kodi side - but @Memphiz and me are still discussing what we break or not in kodi code :-)

popcornmix commented 9 years ago

@fritsch Using jellyfish samples and omxplayer (using pause/resume appearance in log as failure condition): with kodi nfs mount 60Mbit/s plays with kodi nfs mount and fritsch patch 70Mbit/s plays with OS udp nfs mount 80Mbit/s plays

popcornmix commented 9 years ago

With Memphiz' patch videos play with corruption and exits early. Errors in log: http://paste.ubuntu.com/10061453/

fritsch commented 9 years ago

@popcornmix Nice, so we "won" 10 Mbit/s - does increasing the size to 2MB or 4M improve the situation additionally?

popcornmix commented 9 years ago

It's a bit inconclusive. I've tried 1/4MB, 1/2BM, 1MB, 2MB, 4MB and 70Mbit/s often plays without a stutter, but still stutters occasionally (with all settings), making it hard to determine which is best. The OS mount seems much more consistent. I'll also try at home where I go through a wireless bridge where the extra latency may make it more obvious.

sahlberg commented 9 years ago

I have an early prototype of sequential read optimization here: https://github.com/sahlberg/libnfs/tree/seq-read-example

This branch has a feature that when you use nfs_[p]read and sequential streaming is activated, it will try service the request from a buffer. It attempts to keep this buffer to always be full, and to cover all the bytes from teh current offset you are reading from and the next "buffersize" bytes. As you are reading from the buffer, it will issue asynchronous requests in the background to try to refill the buffer.

Could you try this version of libnfs with your RPi? To make it easier to test, I have a temporary line in this patch to force libnfs to always use this mode. This means you don't need to modify kodi for the testing.

This is the line, which is set for all files that are opened: nfs_set_streaming_mode(nfsfh, 200 * NFS_STREAM_BUF_SIZE);

If this works out well, I will remove this line from libnfs and you will have to call this function from the kodi vfs module to activate it. But for testing ...

On Thu, Feb 5, 2015 at 10:09 AM, popcornmix notifications@github.com wrote:

It's a bit inconclusive. I've tried 1/4MB, 1/2BM, 1MB, 2MB, 4MB and 70Mbit/s often plays without a stutter, but still stutters occasionally (with all settings), making it hard to determine which is best. The OS mount seems much more consistent. I'll also try at home where I go through a wireless bridge where the extra latency may make it more obvious.

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108#issuecomment-73095828.

popcornmix commented 9 years ago

Cool. Will test tomorrow.

sahlberg commented 9 years ago

I did some tests on a RPi model B connected via GbE using the streaming test tool.

10000000 bytes per second

nfs-stream nfs://10.10.10.11/data/SNAP-1/TV/Farscape/S02E01.mkv 10000000 Read one 32kb chunk every 3278 us ...

At this speed we need to read one 32kb blob every 3.278ms Occasionally a read takes 1-2ms but vast majority are serviced in 70-100us. We always server out of cache and cache never drops from always almost completely full. We run at ~12% CPU and can easily keep up. Easy peasy, not even sweating.

15000000 bytes per second

Read one 32kb chunk every 2188 us A lot of read latencies now take 100-500us Occassional latencies take 2-4ms i.e. > 2188us but average is well below the required 2188us we need to maintain this rate. We always server out of cache and cache never drops from always almost completely full. Cpu at ~16% We can keep up with this rate without trouble.

20000000 bytes per second

Read one 32kb chunk every 1639 us We can now service perhaps every second request within 1639us and a lot of them takes >2-3ms CPU at 23% We can not keep up with this rate.

So at <20% CPU we can go at at least 15000000 bytes per second, and probably a little more even. 15000000 is 120Mbit/second, so we should be able to stream at 120Mbit/s without any trouble. It does cost ~16% CPU, but the RPi is I understand it quite cpu expensive for doing I/O.

(Note, At these speeds it still takes a 1-2 seconds until the cache is warm and we reach the steady state. Before this, and while we are initially populating the cache the latencies fluctuate a little more.)

120Mbit/s is enough even for very demanding files? I ask out of ignorance here.

On Sat, Feb 7, 2015 at 1:27 PM, ronnie sahlberg ronniesahlberg@gmail.com wrote:

I have an early prototype of sequential read optimization here: https://github.com/sahlberg/libnfs/tree/seq-read-example

This branch has a feature that when you use nfs_[p]read and sequential streaming is activated, it will try service the request from a buffer. It attempts to keep this buffer to always be full, and to cover all the bytes from teh current offset you are reading from and the next "buffersize" bytes. As you are reading from the buffer, it will issue asynchronous requests in the background to try to refill the buffer.

Could you try this version of libnfs with your RPi? To make it easier to test, I have a temporary line in this patch to force libnfs to always use this mode. This means you don't need to modify kodi for the testing.

This is the line, which is set for all files that are opened: nfs_set_streaming_mode(nfsfh, 200 * NFS_STREAM_BUF_SIZE);

If this works out well, I will remove this line from libnfs and you will have to call this function from the kodi vfs module to activate it. But for testing ...

On Thu, Feb 5, 2015 at 10:09 AM, popcornmix notifications@github.com wrote:

It's a bit inconclusive. I've tried 1/4MB, 1/2BM, 1MB, 2MB, 4MB and 70Mbit/s often plays without a stutter, but still stutters occasionally (with all settings), making it hard to determine which is best. The OS mount seems much more consistent. I'll also try at home where I go through a wireless bridge where the extra latency may make it more obvious.

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108#issuecomment-73095828.

popcornmix commented 9 years ago

120Mbit/s is more than needed (actually the Pi's Ethernet is only 100Mbit/s). Raw Blu-Ray video is limited to 40Mbit/s. Add a bit for audio/subs and 50Mbit/s is the most you are likely to see outside of stress test files.

sahlberg commented 9 years ago

ok, I messed up my measurements since the test tool was only reading 8kb chunks and not 32kbyte chunks. So I need to re-run them.

I re-ran it at 10.000.000 bytes per second and it can still keep up without any problem when reading actual 32kb blobs and not incorrectly 8kb blobs.

(I got a bit suspicious something was wrong since I realized that 15000000 bytes per second should actually be impossible) Sorry about that, but 10.000.000 bytes per second should definitely not be a problem and we should be able to go even a little above this.

On Sat, Feb 7, 2015 at 4:11 PM, popcornmix notifications@github.com wrote:

120Mbit/s is more than needed (actually the Pi's Ethernet is only 100Mbit/s). Raw Blu-Ray video is limited to 40Mbit/s. Add a bit for audio/subs and 50Mbit/s is the most you are likely to see outside of stress test files.

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108#issuecomment-73389990.

popcornmix commented 9 years ago

At home I run through a 802.11g wireless bridge, so network throughput is limited. With an OS UDP NFS mount I can play the 10Mbit/s jellyfish clip (fail with 15) With default libnfs I cannot play the 10Mbit/s jellyfish clip without buffering With default libnfs + fritsch patch I cannot play the 10Mbit/s jellyfish clip without buffering With seq-read-example libnfs tree I can play the 10Mbit/s jellyfish clip (fail with 15)

So there is something positive here.

I also tried using filemanager to copy from nfs to usb stick. Using 107MB 15Mbit/s jellyfish clip. With an OS UDP NFS mount it took 49s (17.5Mbit/s) With default libnfs it took 75s (11.4Mbit/s) With default libnfs + fritsch patch it took 70s (12.2Mbit/s) With seq-read-example libnfs tree it took 68s (12.5Mbit/s)

I was running "echo 3 | sudo tee /proc/sys/vm/drop_caches" between each test to avoid disk caching affecting results.

sahlberg commented 9 years ago

On Sun, Feb 8, 2015 at 9:56 AM, popcornmix notifications@github.com wrote:

At home I run through a 802.11g wireless bridge, so network throughput is limited. With an OS UDP NFS mount I can play the 10Mbit/s jellyfish clip (fail with 15) With default libnfs I cannot play the 10Mbit/s jellyfish clip without buffering With default libnfs + fritsch patch I cannot play the 10Mbit/s jellyfish clip without buffering With seq-read-example libnfs tree I can play the 10Mbit/s jellyfish clip (fail with 15)

So there is something positive here.

Nice!

Thanks for testing. So basically for playing media, we are about par with using OS mounts with this branch. Right? And with this patch the performance problems seen on RPi should be eliminated, or at least things will be much better than old libnfs, right?

So how do you want to proceed from here. I am about to release a new version of libnfs in a few days so that I am sure that we have everything in there in a good state for when Nautilus gets nfs support.

This patch to add the streaming support is low risk since without the call to nfs_set_streaming_mode() the whole patch is just a NOOP. As such I will merge the pacth into master before the release.

I will remove the call to nfs_set_streaming_mode() I currrently do for all open files, so you will have to add a call to this function from the kodi vfs module to activate it. How much buffering should you set in this call? No idea, I had 200 * 128kb hardcoded but that is probably crazy overkill. I cant say for sure but I suspect you will be fine with just a few MB.

If I merge this into master and then release a new version in a day or two, that sounds reasonable to you ?

There are some internal changes I would like to do to this support too, such as seeing if I can reduce the cpu utilization a little or better buffer alignment to eliminate some memcpy() calls. But that is nothing that affects the API or stability so I can do that later and it will be transparent to the application.

I also tried using filemanager to copy from nfs to usb stick. Using 107MB

15Mbit/s jellyfish clip. With an OS UDP NFS mount it took 49s (17.5Mbit/s) With default libnfs it took 75s (11.4Mbit/s) With default libnfs + fritsch patch it took 70s (12.2Mbit/s) With seq-read-example libnfs tree it took 68s (12.5Mbit/s)

Ok. Only a tiny improvement. It would be nice to optimize here too, I suspect that the I/O pattern here is slightly different so that my streaming reads strategy might need a few tweeks. But this is not as urgent I assume? I don't think I will be able to start looking into it for a few days, next weekend.

I was running "echo 3 | sudo tee /proc/sys/vm/drop_caches" between each test to avoid disk caching affecting results.

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108#issuecomment-73422419.

popcornmix commented 9 years ago

My current test is s bit crude. For my current wireless bridge setup: I know that libnfs < 10Mbit/s I know that 10Mbit/s < OS NFS < 15Mbit/s I know that 10Mbit/s < libnfs+seqread < 15Mbit/s

So there is an improvement there. I'll try same test at work where the nfs server's latency is much better, but hopefully I'll be able to measure an improvement.

I guess first step is to add support to libnfs and I'll add a patch to my kodi test branch (newclock4) to enable it and we'll get this into @MilhouseVH 's test builds. If we get some positive responses, then bumping libnfs in kodi and adding the enabling patch is the next step.

Memphiz commented 9 years ago

+1 on that plan :)

MilhouseVH commented 9 years ago

Sounds good. On 8 Feb 2015 20:54, "Memphiz" notifications@github.com wrote:

+1 on that plan :)

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108#issuecomment-73431178.

popcornmix commented 9 years ago

Testing again on faster work network:

Using jellyfish samples and omxplayer (using pause/resume appearance in log as failure condition): with kodi nfs mount 60Mbit/s plays with kodi nfs mount and fritsch patch 70Mbit/s plays (sometimes) with libnfs+seqread mount 70Mbit/s plays (reliably) with OS udp nfs mount 80Mbit/s plays (reliably)

So seqread still beneficial (and better than fritsch patch), but OS mount still winning.

popcornmix commented 9 years ago

Well I added the code to my newclock4 kodi branch https://github.com/popcornmix/xbmc/commit/28c7dc62104fce6a7dab059ff5864acb08aed7fb But it seems to be making it worse (which is why I've commented out the nfs_set_streaming_mode call. Am I calling it in the wrong place? It seems to be causing stutters at very start of file (possibly when the cache is filling).

sahlberg commented 9 years ago

Hmm I you call it from the right place. Maybe it is a bit too aggressive when trying to re-fill the cache. Let me update the patch to make it less aggressive when it fills the cache initially Currently it is exponential in that it tries to re-fill several blocks every time you call nfs_read(). I will change it so that it is not exponential and only tries to re-fill a maximum set number of blocks at a time.

I will have that available probably tomorrow. Until then, maybe you can try but use a smaller amount of cached data :

Change it to say 10 * 32768 * 4 and see if it helps? gNfsConnection.GetImpl()->nfs_set_streaming_mode(m_pFileHandle, 200 32768 \ 4);

We are getting there, a few small nags still but I am cinfident we should be able to tweak this so you get reliable and good performance with libnfs.

regards ronnie sahlberg

On Mon, Feb 9, 2015 at 4:22 PM, popcornmix notifications@github.com wrote:

Well I added the code to my newclock4 kodi branch popcornmix/xbmc@28c7dc6 https://github.com/popcornmix/xbmc/commit/28c7dc62104fce6a7dab059ff5864acb08aed7fb But it seems to be making it worse (which is why I've commented out the nfs_set_streaming_mode call. Am I calling it in the wrong place? It seems to be causing stutters at very start of file (possibly when the cache is filling).

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108#issuecomment-73620451.

Memphiz commented 9 years ago

@popcornmix the place where you are calling it from is sane.

sahlberg commented 9 years ago

Please try the current tip of seq-read-example It will now be a lot less aggressive when initially filling the buffer and only have reuests to fill 256kb in flight at a time. It should make it a lot less disruptive during initial start than the previous exponential fill.

Also try setting the buffersize to something smallish, I think 5MByte should be ok: gNfsConnection.GetImpl()->nfs_set_streaming_mode(m_pFileHandle, 5 * 1024 * 1024);

On Mon, Feb 9, 2015 at 4:22 PM, popcornmix notifications@github.com wrote:

Well I added the code to my newclock4 kodi branch popcornmix/xbmc@28c7dc6 https://github.com/popcornmix/xbmc/commit/28c7dc62104fce6a7dab059ff5864acb08aed7fb But it seems to be making it worse (which is why I've commented out the nfs_set_streaming_mode call. Am I calling it in the wrong place? It seems to be causing stutters at very start of file (possibly when the cache is filling).

— Reply to this email directly or view it on GitHub https://github.com/sahlberg/libnfs/issues/108#issuecomment-73620451.

popcornmix commented 9 years ago

Sorry, I'm not seeing any benefit now. With either previous or latest version of code I'm getting significantly reduced performance: with kodi nfs mount 70Mbit/s plays (often) with libnfs+seqread mount 30Mbit/s plays (reliably) with OS udp nfs mount 80Mbit/s plays (reliably)

I can only assume my previous positive result was invalid (I'm suspecting I wasn't rebuilding libnfs correctly and just saw some random variation that looked like an improvement).

This is what I'm currently testing: https://github.com/popcornmix/xbmc/commit/b413c83df623038571ef6df624eb9e13063ea712 (but with the nfs_set_streaming_mode call uncommented)

Even with a very small buffer: gNfsConnection.GetImpl()->nfs_set_streaming_mode(m_pFileHandle, 256 * 1024); I'm still getting a stall at the start of the 40Mbit/s file that makes us buffer (and so get treated as a failure).

(smaller numbers like 32 * 1024 cause seg-faults)