rolffokkens / bdsync

a fast block device synchronizing tool
GNU General Public License v2.0
109 stars 26 forks source link

Use POSIX_FADV_DONTNEED after pread to reduce cache pollution #12

Closed shodanshok closed 8 years ago

shodanshok commented 8 years ago

Reading large file (for comparison) uselessy pollute buffer/cache memory. This patch greatly decreases cache pollution by calling POSIX_FADV_DONTNEED after each pread (in vpread function).

This is a better solution as opening the file/dev with O_DIRECT flag because direct access has alignment restriction on both source (file/dev) and destination (application buffer). Moreover, it is quite portable and should have no problem with filesystem without O_DIRECT semantic.

rolffokkens commented 8 years ago

Seems like an good suggestion indeed. Just wonderinhg though: in --twopass mode bdsync may read diskblocks blocks twice, in which case POSIX_FADV_DONTNEED may hurt. In the first pass a block may be read as part of calculating checksums on large chunks. When checksums differ, the specific chunks are read in little sub-chunks to do checksum matching again, and identify the actual differences.

shodanshok commented 8 years ago

If I understand it correctly, when used with --twopass the file is first completely read in large chunks, then (eventual) different chunks are split, and compared, in smaller chunks.

It this is true, than the second pass can benefit from cache only when, and if, the file is completely cached in the pagecache, otherwise it will cause quite severe cache trashing. Considering that this patch should prevent exactly this cache-trashing scenario, I think it is OK to clean the cache after each vpread.

If I misunderstanded something, please let me know.

rolffokkens commented 8 years ago

It's a litte more complex. With --twopass both client and server read a set of large chunks, and then compare checksums. For any different checksum, a new set is built consisting of sub-chunks (likely 4k each) of the mismatching chunk, then this client and server compare checksums of these smaller subchunks. After this the next set of large chunks is processed. So the timespan between large chunks and small chunks is pretty short.

The client should be able to do POSIX_FADV_DONTNEED in a straightforward fashion though, because it knows when no further analysis of (sub-)chunks is needed.

The problem is the server though: it just renders whatever checksums for whatever chunks in whatever sizes in whatever the client requests. This might be solved by having the client sending a kind of hint to the server when a certain set if chunks is done, and will definitely no longer be needed by the server, after which the server can do a POSIX_FADV_DONTNEED.

This is probably quite straightforward, but it is a protocol change that breaks clients and servers of different versions - unless some backward compatibilty is built in. Hm.

shodanshok commented 8 years ago

I understand. In my opinion the simplest path should be preferred, at least in the short term. I suggest against a protocol change, rather let the user choice between:

The proposed pull has very small impact on existing code base (it is one line!), yet it should be quite useful to anyone trying to avoid unnecessary cache pollution.

rolffokkens commented 8 years ago

I did some testing: there's quite a delay between the two reads (actually there may be 3 reads, when a difference is identified) in the --twopass situation, due to extensive buffering / asynchronisity. so using that as an argument to postpone the POSIX_FADV_DONTNEED will still not lift the burden from the buffercache.

So I agree with your approach. But I'll implement it as a command line option, should be easy to do.

Looked into the protocol change as well. On second thought there's no need, misusing the mechanism to request hashes for a number of chunks may do the trick: just request it for 0 chunks, and process that as a hint.

rolffokkens commented 8 years ago

The branch fadv_dontneed now has a cmd line option --flushcache that does POSIX_FADV_DONTNEED like you proposed. Without the option POSIX_FADV_DONTNEED is also done, but only after bdsync has certainty that it won't be reading the data itself.

The code passes all tests, and POSIX_FADV_DONTNEED doesn't change the core execution, so I think it should all work fine. Some further testing is needed though.

shodanshok commented 8 years ago

Great, thank you for the new branch. I'll do some test in the following days.

However, until the new branch is thoroughly testes, I suggest pulling the proposed one-line change for the master branch: it should be "bug free" (it is a simple posix_fadvise), yet it brings a substantial reduction in cache pollution with small or no performance impact.

rolffokkens commented 8 years ago

Dit some testing on a large 160GB image, it worked OK.

shodanshok commented 8 years ago

Very good. My preliminary tests passed correctly, also. I'll continue to use the new branch and report back on eventual problems. If all is OK this is going to became the new master branch, right?

rolffokkens commented 8 years ago

Thanks for your feedback! If all is OK this will be merged into master indeed!

rolffokkens commented 8 years ago

Merged it into master today, and used it in a high volume production situation. I did observe less memory pressure indeed due to the POSIX_FADV_DONTNEED. Excellent suggestion!