i2p / i2p.i2p-bote

I2P-Bote is a serverless, encrypted e-mail application.
https://i2pbote.xyz
Other
146 stars 39 forks source link

File System Abstraction (Trac #1316) #21

Open str4d opened 7 years ago

str4d commented 7 years ago

I noticed that the I2P router does a lot of disk io during startup, which seems to slow things down. I believe it's due to the fact that there are a lot of small files in the folder that contains my I2P application data.

As an example: My I2PBote plugin alone maintains 2,898 files(!), each only about 1,716 bytes large on average, amounting to less than 5 MB in total.

Obviously accessing lots of small files hurts performance. Wouldn't it be nice if the I2P router offered its core modules as well as its plugins some kind of own file system abstraction, where the module/plugin "sees" what appears to be a file system, while there is in fact just one large file ("container") on the disk, and where that one file is being dynamically resized as required? Each module/plugin would have its own container. This would allow the operating system to cache files much better and increase overall performance, especially on router startup.

This approach could also work around potential problems that are related to platform differences between operating systems that handle paths case-sensitive and those that handle them case-insensitive. Plus it could "sandbox" file system access to prevent accidental file io outside a module's folder on the actual file system. Another benefit would be that certain file access could be handled entirely in user-mode and would thus be faster than having to do expensive syscalls all the time. You could even allow a module to transparently encrypt its container with a password.

I hope you get the idea.

Migrated from https://trac.i2p2.de/ticket/1316

{
    "status": "assigned", 
    "changetime": "2016-08-05T15:06:29", 
    "description": "I noticed that the I2P router does a lot of disk io during startup, which seems to slow things down. I believe it's due to the fact that there are a lot of small files in the folder that contains my I2P application data.\n\nAs an example: My I2PBote plugin alone maintains 2,898 files(!), each only about 1,716 bytes large on average, amounting to less than 5 MB in total.\n\nObviously accessing lots of small files hurts performance. Wouldn't it be nice if the I2P router offered its core modules as well as its plugins some kind of own file system abstraction, where the module/plugin \"sees\" what appears to be a file system, while there is in fact just one large file (\"container\") on the disk, and where that one file is being dynamically resized as required? Each module/plugin would have its own container. This would allow the operating system to cache files much better and increase overall performance, especially on router startup.\n\nThis approach could also work around potential problems that are related to platform differences between operating systems that handle paths case-sensitive and those that handle them case-insensitive. Plus it could \"sandbox\" file system access to prevent accidental file io outside a module's folder on the actual file system. Another benefit would be that certain file access could be handled entirely in user-mode and would thus be faster than having to do expensive syscalls all the time. You could even allow a module to transparently encrypt its container with a password.\n\nI hope you get the idea.", 
    "reporter": "ExtraBattery", 
    "cc": "HungryHobo, str4d", 
    "resolution": "", 
    "_ts": "1470409589334842", 
    "component": "apps/plugins", 
    "summary": "I2P-Bote: File System Abstraction", 
    "priority": "maintenance", 
    "keywords": "I2P-Bote performance", 
    "version": "0.9.13", 
    "parents": "", 
    "time": "2014-06-21T08:38:14", 
    "milestone": "", 
    "owner": "str4d", 
    "type": "task"
}
str4d commented 7 years ago

Trac update at 20140621T12:13:18:

The majority are RouterInfo files (one per peer) and second are peer profiles (one per peer).

  • We could combine the two (tricky because the two subsystems that use the two files are independent).

  • We could zip them all together (ok for profiles which are only read at startup and written at shutdown, but routerinfos are written periodically so it wouldn't work for that). Zipping also means a corruption loses all the files.

  • The profiles are gzipped now. That's why they are small.

  • Startup and shutdown is very slow on the Raspberry Pi. This is a possible cause. However, as you say a large number of files only "seems" to slow things down. We don't know.

  • All the above is for the router. I don't know how many files Bote uses or how it does so. One per email is a good guess though.

  • It's very tough to sandbox plugins or abstract/trap/prevent direct file system access. We explicitly reject a sandbox security model for plugins, it's way way too hard.

The way forward is profiling, logging, measurement to identify the true bottlenecks, then propose and experiment with improvements. You may wish to start a discussion with the Bote developers. Sometimes a simple fix like a BufferedInputStream can work wonders. But gotta identify and measure the root cause first.

str4d commented 7 years ago

Trac update at 20140708T09:15:43: ExtraBattery commented:

  • I'm running on x86-64 PCs. The router shutdown is not slow at all, just the start. I notice that the CPU isn't under much load while starting the router, but the hard drive is very busy. So I thought file io is presumably the bottleneck in my case. It goes away after the OS has cached the necessary files, so if I shut down and start again, the second start is much faster.

  • Usually I would attribute this to lots of single files being accessed, as I'm not under the impression that the total amount of file content being read is large.

  • It could also be that the slow start is not due to the router, but due to the initialization of Java. It's hard to tell.

  • The majority of files in my I2P application data folder belongs to I2P-Bote (currently over 3,200 of I2P-Bote alone). The majority of files that belong to the router itself are in the folders "netDb" and "peerProfiles" (both together have about 1,500 files). The rest is just about a hundred of files.

  • I don't know what I2P-Bote uses all the files for. I have maybe a hundred mails, still thousands of files. I don't know if I2P-Bote really accesses them frequently.

  • I didn't mean a sandbox that protects from malice, but merely from accidents.

str4d commented 7 years ago

Trac update at 20140928T15:23:23:

My comments above were regarding the files the router uses.

As comment 2 above, and the OP, reference I2P-Bote as the owner of the majority of the files, assigning to HungryHobo.

str4d commented 7 years ago

Trac update at 20150110T04:16:32: str4d changed keywords from "" to "I2P-Bote performance"

str4d commented 7 years ago

Trac update at 20160805T15:06:29: