mar-file-system / marfs

MarFS provides a scalable near-POSIX file system by using one or more POSIX file systems as a scalable metadata component and one or more data stores (object, file, etc) as a scalable data component.
Other
96 stars 27 forks source link

make_packed needs updating #135

Closed jti-lanl closed 8 years ago

jti-lanl commented 8 years ago

This tool was originally a quick hack, useful for hand-building packed files for basic testing, but now it's out-of-date. Needs some updating, if we're going to rely on it.

[edited from an email]

(1) Use the 'marfs_config' tool which is built in marfs/common/configuration/src. This thing reads the configuration and can spit out information that the script could parse to generate host IP-addrs, etc. It was intended to help scripts like make_packed to avoid having to hard-code IP-addrs etc.

For example:

[pftool]# marfs/common/configuration/src/marfs_config -r bparc

[pftool]# marfs/common/configuration/src/marfs_config -r bparc Repo name bparc name_len 5 host 192.168.0.%d:81 host_len 16 host_offset 1 host_count 6 access_method SPROXYD chunk_size 1073741824 security_method HTTP_DIGEST max_pack_file_count -1 min_pack_file_count 10 max_pack_file_size 104857600 min_pack_file_size 1

[pftool]# marfs/common/configuration/src/marfsconfig -n admins [etc]_

[pftool]# marfs/common/configuration/src/marfs_config -n admins Namespace name admins mnt_path /admins bperms 0xf3 iperms 0xf3 mdpath /foo/admins/mdfs_ iwrite_repo bparc repo_range_list [0] (min: 0, max: -1) -> bparc

[You could just require the user to have a PATH that includes the directory where marfs_config lives, or you could rely on $MARFS (etc), which we are all typically setting to point to our personal copy of the marfs installation-dir.]

So, then the arguments to make_packed could be e.g.

makepacked [ _ ]

or just:

makepacked _

Then the script could just figure out the paths and host IP-addrs, and generate obj-IDs, etc.

(2) parsing the namespace repo-range-list is not as simple as it could be. Feel free to improve 'marfs_config" to put out something easier to parse. On the other hand, maybe we don't have to pay attention to that, for starters.

Let's skip that.

(3) recovery-info now has size 2943, I think. So, the object-size needs to be bigger than that.

(4) To generate the IP address, if "host_count" is > 1, "host" will be a printf format string, as shown above. In that case, just use bash "printf" with the parsed "host", providing host_offset as the argument.

(5) Most of our repos now use HTTP "digest" for authentication. (See "security_method" in the output above..) In such cases, the script should add '-u root --digest' to the curl arguments, and let the person running the script type in the password at run-time. But it's going to do that for every object. Ugh. Maybe we should move this to C/C++.

jti-lanl commented 8 years ago

Done. For example:

cd $MARFS/fuse/src ../scripts/build_marfs 3 1 make mnt mkdir /marfs/ns/packed ../scripts/make_packed [options] /marfs/ns/packed pack

By default, this creates four regular MarFS files named /marfs/ns/packed/pack[1-4], then concatenates all the object-contents into a local file, deletes old objects, creates a new object-ID with "packed" type, writes the concatenated contents into the new obj-ID, and updates OBJID and POST xattrs on all the MDFS files to refer to offsets in the new packed object. Ideally, we'd avoid deleting the original objects until everything else worked, but that was awkward for now.

For now, you have to enter the obj-storage password on every single freaking obj-interaction (9 events in the default case). For convenience, copy it once into your kill-ring, and then just paste it every time you need it. I didn't want to risk making a security hole just to avoid this, though I think I now see how to supply the password from the script without exposing it on the command-line or in the script. Get to that later.