mekberg / boar

Automatically exported from code.google.com/p/boar
61 stars 8 forks source link

Add additional dir level to blobs/ #39

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
As it stands, the blobs/ dir is subdivided into 256 folders. For use cases 
involving very large datasets (ie 1M+ files) having directories with 3000+ 
files in them gets unwieldy and can effect performance. What are your thoughts 
on allowing an upgrade path for /blobs/12/34/1234567890abcdef ?

This would allow for virtually any size dataset (2 subdir nesting is what you 
often see in the urls of file and imagehosts that store files by hash). If you 
want to allow backward compatibility, you could specify a "repo version" 
property either in the main repo dir or in the session file?

Anyway, really liking boar. I need to brush up on python a bit, but I'd like to 
submit patches sooner rather than later, and not just endless issues/requests :P

Original issue reported on code.google.com by cryptob...@gmail.com on 21 Dec 2011 at 5:57

GoogleCodeExporter commented 9 years ago
I agree that a repository can quickly become large enough to make it 
inconvenient to browse around the blobs manually. But then again, hundreds of 
thousands of files named something like "650ab14dd0caba8f71d2db9b4a3abb90" 
isn't very user friendly to begin with. I'm more concerned with the performance 
part. Do you have any numbers or examples backing up the performance problem 
claim? Boar performs these operations often:

* Checking the existence of a blob
* Opening a blob for reading
* Listing all the blobs

However, another problem is that fat32 only supports about 20000 file entries 
per directory (for 16 char long filenames), which allows a maximum of 5 million 
files in a repository for fat32. Not good. The easiest way out is of course to 
say that boar doesn't support fat32...

Original comment by ekb...@gmail.com on 29 Dec 2011 at 2:06

GoogleCodeExporter commented 9 years ago
Closing this one for the time being, until someone can show in numbers that 
this is a real problem in some situations.

Original comment by ekb...@gmail.com on 29 Feb 2012 at 12:51