rndusr / torf

Python module to create, parse and edit torrent files and magnet links
GNU General Public License v3.0
180 stars 17 forks source link

Fix extremely inefficient `in` check #17

Closed mon closed 4 years ago

mon commented 4 years ago

I have a torrent with several thousand files, and the filter_func's in check performs __eq__ on the Filepath objects. Because it doesn't use the hash cache, it performs over 30 million lstats, bringing the disk to its knees.

By using the cached hash, my torrent is constructed quickly.

rndusr commented 4 years ago

This breaks test_Filepath_is_equal_to_relative_path.

This seems works, but it's not efficient again:

hash(self) == hash(type(self)(other))

Maybe adding an instance cache to the Filepath class would work?

mon commented 4 years ago

Ah shoot, I totally forgot the tests - I'll try and implement an instance cache later this evening and make sure the tests pass!

rndusr commented 4 years ago

The instance cache was just an idea. Feel free to think of a better solution.

mon commented 4 years ago

Now that I'm running the tests I see the problem caused by FilePath objects being compared to PosixPath objects.

Would you be opposed to "upgrading" any pathlib.Path style objects by replacing their __class__ member with FilePath, so they gain hash caching without having to mess with the caller? It feels a bit hacky but it would work quite well.

mon commented 4 years ago

Actually no, considering the existing tests this is a horrible idea. I'll think of something better.

mon commented 4 years ago

Alright! I think this suits much better, since torrent creation is always comparing Filepaths to Filepaths, a simple instance check lets us be speedy without breaking semantics during tests.

rndusr commented 4 years ago

Looks great. Very simple. Thanks!

I made a bugfix release.