zlatinb / muwire

MuWire file sharing client for I2P
GNU General Public License v3.0
193 stars 27 forks source link

files.json takes very long to write #36

Closed ghost closed 4 years ago

ghost commented 4 years ago

As mentioned in #35 files.json can grow to very larges sizes when sharing large folders (hundreds of GB or more) and subsequently takes a very long time to write.

After adding logs to see just how long it takes (https://github.com/LoveIsGrief/muwire/commit/9d4b365e63c74a6863d1c8d562c9e1487bda2c04) files.json had grown to 360MB and took ~15s to write

2020-01-19 21:33:47.309 SEVERE org.codehaus.groovy.vmplugin.v7.IndyInterface selectMethod ===Time(ms) to write tmp files.json: 34243
2020-01-19 21:33:47.697 SEVERE org.codehaus.groovy.vmplugin.v7.IndyInterface selectMethod ===Time(ms) to copy tmp files.json: 387
2020-01-19 21:34:25.999 SEVERE com.muwire.core.files.PersisterService$_persistFiles_closure8 doCall ===Time(ms) to write tmp files.json: 12953
2020-01-19 21:34:26.314 SEVERE com.muwire.core.files.PersisterService$_persistFiles_closure8 doCall ===Time(ms) to copy tmp files.json: 315
2020-01-19 21:35:28.794 SEVERE com.muwire.core.files.PersisterService$_persistFiles_closure8 doCall ===Time(ms) to write tmp files.json: 15747
2020-01-19 21:35:29.134 SEVERE com.muwire.core.files.PersisterService$_persistFiles_closure8 doCall ===Time(ms) to copy tmp files.json: 339
2020-01-19 21:36:28.874 SEVERE com.muwire.core.files.PersisterService$_persistFiles_closure8 doCall ===Time(ms) to write tmp files.json: 15827
2020-01-19 21:36:29.209 SEVERE com.muwire.core.files.PersisterService$_persistFiles_closure8 doCall ===Time(ms) to copy tmp files.json: 335
2020-01-19 21:37:28.009 SEVERE com.muwire.core.files.PersisterService$_persistFiles_closure8 doCall ===Time(ms) to write tmp files.json: 14784
2020-01-19 21:37:28.347 SEVERE com.muwire.core.files.PersisterService$_persistFiles_closure8 doCall ===Time(ms) to copy tmp files.json: 338
2020-01-19 21:38:28.482 SEVERE com.muwire.core.files.PersisterService$_persistFiles_closure8 doCall ===Time(ms) to write tmp files.json: 15258
2020-01-19 21:38:28.814 SEVERE com.muwire.core.files.PersisterService$_persistFiles_closure8 doCall ===Time(ms) to copy tmp files.json: 332
2020-01-19 21:39:27.974 SEVERE com.muwire.core.files.PersisterService$_persistFiles_closure8 doCall ===Time(ms) to write tmp files.json: 14749

It might be better to either:

or find a solution that doesn't involve writing all existing hashlists every minute.

If you like, we can agree on a solution and I can work on it. I have about 2 weeks of holidays left.

zlatinb commented 4 years ago

Thanks for the analysis. Indeed the current files.json logic was the simplest thing that would work and it's not surprising that it runs into problems with large libraries.

If you want to look into changing the logic, there are a few possible routes:

The first two options must have a conversion logic which makes upgrading MuWire seamless.

ghost commented 4 years ago

Thanks for the consideration.

I had a quick look at the embedded DBs based on benchmarks from 2012:

They all would at least double the size of the plugin :slightly_frowning_face:

I'll have to think about the folder structure more in depth too.

In principle the conversion logic should be as easy as reading the old DB and passing the hashlists to the new DB writer. But there might be surprises in the code.

Btw, I didn't see an issue on https://github.com/zlatinb/muwire/blob/master/doc/infohash-upgrade.md . Has that already been implemented? In the code I still see that the hashlist is expected to have 32 hashes https://github.com/zlatinb/muwire/blob/27831b488b3726c1921a38d20e3239f9b405fb5d/core/src/main/java/com/muwire/core/SharedFile.java#L44

zlatinb commented 4 years ago

They all would at least double the size of the plugin

The plugin is compressed with pack200 followed by zip. The uncompressed plugin is around 5-6 MB otherwise

Re: infohash upgrade - yes implemented quite some time ago. What you're seeing in the code is splitting the the large single byte array hashlist into smaller byte arrays, 32 bytes each. (In retrospect that turned out to be completely unnecessary)

zlatinb commented 4 years ago

Closing issue, will re-open if/when necessary