Master with a bigger RAM than the SHADOW

ghost commented 8 years ago

If I understand correctly, the whole metadata of LizardFS Master is stored in RAM.

Let's say we have four servers with this amount of RAM:

lfs001: 11G RAM (shadow)
lfs002: 15G RAM (shadow)
lfs003: 15G RAM (shadow)
lfs004: 23G RAM (master)

2 questions:

What is going to happen to lfs001 (11G of RAM only) if the RAM of the master lfs004 (23G of RAM) is filled-up? Could lfs001 have an issue to replicate the metadata from lfs004 because it has less RAM?
Is the MooseFS method to calculate the size of the partition /var/lib/lizardfs/ is still valid with LizardFS: SPACE = RAM * (BACK_META_KEEP_PREVIOUS + 2) + 1 * (BACK_LOGS + 1)[GiB]

Thanks.

Zorlin commented 8 years ago

No, that should be fine as long as your metadata set is smaller than ~5.5GB (you need twice as much RAM as the metadata set as the process forks once an hour and dumps a copy of metadata to disk).

On Sep 16, 2016 05:49, "Asher256" notifications@github.com wrote:

Hi,

I've four servers with LizardFS:

lfs001: 11G RAM

lfs002: 15G RAM

lfs003: 15G RAM

lfs004: 23G RAM

If I understand correctly, the whole metadata is stored in RAM.

If lfs004 (23G RAM) is the master and lfs001 (11G RAM only) is the slave, could lfs001 have an issue to replicate the metadata because it has less RAM than lfs004?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lizardfs/lizardfs/issues/473, or mute the thread https://github.com/notifications/unsubscribe-auth/ABTmrPbh0WoUm0k7qbcE_EpA2vbW5ciaks5qqb1agaJpZM4J-ZCz .

psarna commented 8 years ago

If lfs004 is filled-up (i.e. its metadata occupies 23GB), then, naturally, lfs001 will have a big problem with loading 23GB to 11GB capacity.

@Zorlin it's not like you need twice as much RAM. Metadata dump will still occur in foreground (which can freeze master for a relatively big amount of time, but it will still work anyway). Or, you can force your OS to allow "excessive" forks with /proc/sys/vm/overcommit_memory and other /proc/sys/vm/overcommit_* settings.

ghost commented 8 years ago

@psarna: Can I prevent LizardFS to use 23G? Is there any option for that in the master's configuration file?

Zorlin commented 8 years ago

The master's ram usage is almost exactly the size of the metadata set. It won't automatically fill as much space as it can. Example, with 1.5TB used and 4.4 million files we have only ~2GB used.

On Sep 16, 2016 20:02, "Asher256" notifications@github.com wrote:

@psarna https://github.com/psarna: Can I prevent LizardFS to use 23G? Is there any option for that in the master's configuration file?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/lizardfs/lizardfs/issues/473#issuecomment-247583712, or mute the thread https://github.com/notifications/unsubscribe-auth/ABTmrGMtsh2cwtJ5Qyneh282DlZCqNoxks5qqoVfgaJpZM4J-ZCz .

ghost commented 8 years ago

Good to know Zorlin ;) .

ghost commented 8 years ago

What about: Is the MooseFS method to calculate the size of the partition /var/lib/lizardfs/ is still valid with LizardFS: SPACE = RAM * (BACK_META_KEEP_PREVIOUS + 2) + 1 * (BACK_LOGS + 1)[GiB]

Is it still relevant?

Zorlin commented 8 years ago

@Asher256 I think so, but note "RAM" in that equation is (as far as I know) the memory usage of the LizardFS metadata set, NOT your total amount of RAM!

4Dolio commented 7 years ago

The master's ram usage is almost exactly the size of the metadata set. It won't automatically fill as much space as it can. Example, with 1.5TB used and 4.4 million files we have only ~2GB used.

The critical point of distinction here is that the 4.4 million files is resulting in ~2GB of metadata, not the 1.5TB worth of data that number of files represents in consumed disk capacity. A single 1.5TB sparse file will have zero chunks stored on any chunk servers and result meer MB of metadata size, perhaps 200MB. As data is written into such a single large file, chunks will be created and the metadata size will grow only ever so slightly even when the whole 1.5TB is full of actual data.

So for example, you won't want to store a massive mail server with billions upon billions of tiny files natively in LFS or else you will need to have many 100's of GB of RAM available. And be willing to wait for slow master start/stop and shadow sync durations.

However, storing billions of files is no problem for all modern traditional filesystems, they all read their metadata from their "superblock" only when needed and have no expectation of caching the entire thing in RAM like LFS does. LFS does do large and sparse file very well and doing so would almost eliminate all of the metadata RAM consumption for a given amount of raw data, replacing it with reads from some set of chunks which store the superblock of your favorite filesystem. Performance will be better; you can use loop-backed ext in a sparse file backed by LFS, or ZFS and compression using file block device backed by LFS, or even iSCSI initiator LUNs living in files backed by LFS. You get the idea..

The only quark I have run into was that when hosting iSCSI with ZFS on Chunk Servers which were virtualized and backed by a weak RAID-5 hardware set of 6SSDs. Something kept periodically tripping over some rare edge case causing non-fatal false-faults during chunk reads that replication would fix. Even though upon closer examination the "faulty" chunk files all seemed perfectly fine and valid. Assid from that admittedly wild setup ZFS generally performs well enough on bare metal Chunk Servers and offers solid compression for chunk data where appropriate.

lizardfs / lizardfs

Master with a bigger RAM than the SHADOW #473