Performance fix for delta generation

therealmik / pyrdiff

Python rdiff/rsync implementation (for experimental/learning purposes only)

MIT License

12 stars 2 forks source link

Performance fix for delta generation #2

Open pavel-odintsov opened 10 years ago

pavel-odintsov commented 10 years ago

Hello!

I plans to test delta generation performance but found this in your code:

268: buf = changedfd.read() # Just read the whole damn file into memory

Could you fix this code to real file reading instead putting whole file to memory?

Thank you so much :)

therealmik commented 10 years ago

Ha, forgot about that.

I did have some horrible buffering algorithm before, but I was trying to make readable code so I just did that :)

I've committed a change that doesn't do that anymore - but it does issue lots of small reads. I also added support for buffering when opening a file (just the 3rd argument to open), so hopefully that's not too slow.

pavel-odintsov commented 10 years ago

Thank you so much :) You are very kindly!

therealmik commented 10 years ago

Just a warning - you might want to write some test cases for this if you actually use it on real data. I tried a few, but who knows what crazy stuff I've done...

vps2fast commented 10 years ago

Hello, Michael!

I'm compiled (oh, I need for it about 8 hours!) and installed pypy-3 for testing mainline version of pyrdiff.

I run it for 4Gb file and it eat almost all my memory :)

695218 root      20   0 8380m 6.6g  12m R 100.0  1.7   3:47.99 pypy-c

Because of this:

    def _generate_delta(signatures, changedfd):
        """Given a file object and signatures from another file,
           generate a set of deltas (LiteralChange / CopyChange)"""
        buf = changedfd.read() # Just read the whole damn file into memory

Could you fix it in Python-3 branch?

vps2fast commented 10 years ago

And this code:

buf = buf[offset+blocksize:]

Looks like memory allocator killer.. But I'm not sure about underlying implementation.

therealmik commented 10 years ago

Those two are related. I don't have any time to do this work at the moment, but feel free to send a pull request if you make the change.

vps2fast commented 10 years ago

Hello!

Did it :) Please check this: https://github.com/therealmik/pyrdiff/pull/3 But please review patch very thoroughly because I did not execute tests for enough amount of test cases.