mar-file-system / marfs

MarFS provides a scalable near-POSIX file system by using one or more POSIX file systems as a scalable metadata component and one or more data stores (object, file, etc) as a scalable data component.
Other
96 stars 27 forks source link

Packed Object Threshold Value Needed #102

Closed atorrez closed 8 years ago

atorrez commented 8 years ago

The configuration file currently contains chunk_size and pack_size. These parameters will be used by the packer to determine the maximum size for a packed file (chunk_size) and the max size of an object to pack (pack_size). However, I believe that a value should exist to determine the minimum object size for a packed file. This is the case where we cannot reach the maximum but objects exists that meet the criteria for packing. As an example if the chunk size is 2GB and the pack size is 1MB, lets say we 20 1M objects. Would 20MB be worth packing?

brettkettering commented 8 years ago

We decided that we need min and max # objects to pack, and a parameter to say don't pack any object bigger than "X" bytes or smaller than "Y" bytes. Time for implementation.

jti-lanl commented 8 years ago

The consensus seems to be that we need the following new per-repo config options:

min_pack_file_size [don't bother packing individual files smaller than this] max_pack_file_size [don't pack individual files larger than this] min_pack_file_count [final packed object should have at least this many files] max_pack_file_count [don't pack more than this many files into an object]

I think the point of constraining file-count is to avoid e.g. packing two small files that remained at the end of a packing run, or to invite fragmentation by packing 1000 files.

We could allow all these options to have defaults. I'd guess min_pack_file_size would default to 1 (after we stop creating objects for 0-length files).

We could also allow values of -1 to imply "unconstrained", or some sensible computed defaults e.g based on repo.chunksize.

jti-lanl commented 8 years ago

Added the new configuration parameters.

Removed old Repo.pack_size. (Old configs with this parm will get an error.)

For the new parms, value -1 means "unlimited".

Disable packing by setting max_pack_file_count = 0. If you do this, you don't need to provide other packing-related parms (and they will default to -1, anyhow). It's an error not to provide any of the packing parms.

There are examples in marfs_cctest.cfg

atorrez commented 8 years ago

Functionality implemented and testing now.

atorrez commented 8 years ago

Implemented and initial testing shows no problems. Going to close this but testing will continue and issues opened if necessary