Closed ghost closed 4 years ago
Thanks for the analysis. Indeed the current files.json
logic was the simplest thing that would work and it's not surprising that it runs into problems with large libraries.
If you want to look into changing the logic, there are a few possible routes:
files.json
gets persisted only if the flag is set.The first two options must have a conversion logic which makes upgrading MuWire seamless.
Thanks for the consideration.
I had a quick look at the embedded DBs based on benchmarks from 2012:
They all would at least double the size of the plugin :slightly_frowning_face:
I'll have to think about the folder structure more in depth too.
In principle the conversion logic should be as easy as reading the old DB and passing the hashlists to the new DB writer. But there might be surprises in the code.
Btw, I didn't see an issue on https://github.com/zlatinb/muwire/blob/master/doc/infohash-upgrade.md . Has that already been implemented? In the code I still see that the hashlist is expected to have 32 hashes https://github.com/zlatinb/muwire/blob/27831b488b3726c1921a38d20e3239f9b405fb5d/core/src/main/java/com/muwire/core/SharedFile.java#L44
They all would at least double the size of the plugin
The plugin is compressed with pack200 followed by zip. The uncompressed plugin is around 5-6 MB otherwise
Re: infohash upgrade - yes implemented quite some time ago. What you're seeing in the code is splitting the the large single byte array hashlist into smaller byte arrays, 32 bytes each. (In retrospect that turned out to be completely unnecessary)
Closing issue, will re-open if/when necessary
As mentioned in #35
files.json
can grow to very larges sizes when sharing large folders (hundreds of GB or more) and subsequently takes a very long time to write.After adding logs to see just how long it takes (https://github.com/LoveIsGrief/muwire/commit/9d4b365e63c74a6863d1c8d562c9e1487bda2c04)
files.json
had grown to 360MB and took ~15s to writeIt might be better to either:
files/<filepath hash 0-5>/<filepath hash 6-31>.json
files/<filepath hash>.json
files/<share root>/<path to file>/<filename>.json
or find a solution that doesn't involve writing all existing hashlists every minute.
If you like, we can agree on a solution and I can work on it. I have about 2 weeks of holidays left.