question - lizardfs as FS for home backup server

eleaner commented 5 years ago

Hi,

I would like to ask for the best settings and for the minimum hardware to use lizard as FS for a home backup server.

The idea is to:

run on chunk server for each drive (possibly docker) say 15 elderly HDD of different sizes
run master on the same machine
run shadow/metaloger on other machines at home for any case
use EC 6.3 or something around these lines for enough protection against HDD failure and to give lizard enough space breathing space to balance the chunks

benefits:

easy to exchange HDD in case of failure or just replace old 320GB with less old 1TB - beats ZFS
more available space with technically better resiliency than BTRFS raid1
and I still have COW and erasure to protect data consistency

The backup is run once a day (borg) and the server is meant to go to sleep afterwards until the next backup The server does not do anything else. It's just elaborate drawer for older drives.

Technically I am not only taking the "distributed" out of the equation but also the idle server time in which lizard performs all the scheduled operations.

What would be the best settings on chunks and master server to ensure there is not much idling? Ideally, I want the lizard processes run all the time but only if needed (e.g balancing) so I can monitor the processes and put the machine to sleep if nothing is happening. how much cpu power and memory will be required to handle 15 drives?

Blackpaw commented 5 years ago

I do something similar for my home media server - 5 external drives divided between three chunkservers all on the same PC. It takes a bit of manual config to setup. Using a mix of Goal 2 and ec(2,1)

Lizardfs is pretty low on requirements. My setup copes fine with multiple recordings and playbacks going on:

Intel(R) Celeron(R) CPU N2820 @ 2.13GHz
4GB RAM

Ideally, I want the lizard processes run all the time but only if needed (e.g balancing) so I can monitor the processes and put the machine to sleep if nothing is happening.

I see no issue with that.

eleaner commented 5 years ago

Ideally, I want the lizard processes run all the time but only if needed (e.g balancing) so I can monitor the processes and put the machine to sleep if nothing is happening.

I see no issue with that.

Well. I do. Lizard performs periodical maintenance. Obviously, I can switch the machine off, but I'd like to adjust configuration that it does as much maintenance as only possible in the time it has. Or to figure out the minim time it would have to run every day. I am afraid of the scenario that e.g. one HDD died and I did not notice, lizard tried to rebalance but didn't have time, I kept putting new data in and then you can imagine.

matthiaz commented 5 years ago

Do I understand it correctly that you'll have 1 big machine wich servers as master and chunkserver? ec 6.3 is pretty high. If I remember correctly, you need 9 different places to store the data. And you want that to be a little higher so you can swap out drives when you need to. If you want to keep that data online when a machine goes offline, you need those different places to be different machines as well. ec has an impact on CPU usage, so if you are looking at low power, go for normal goals. 2 or 3. Yes, you need more harddrives, but ec is not exactly stable, and less cpu usage = less power consumption.

How much data are we talking about?

Turning machines (harddrives) on and off every day, has an impact on their life. Are you sure you want to do this? What is your reasoning behind this?

I am afraid of the scenario that e.g. one HDD died and I did not notice, lizard tried to rebalance but didn't have time, I kept putting new data in and then you can imagine.

-> you could use lizardfs-admin to check the status of the chunks. This can tell you what "work" the master has to do. And it can also tell you if a drive fails, or is about to fail.

eleaner commented 5 years ago

@matthiaz thank you for your comments

Do I understand it correctly that you'll have 1 big machine wich servers as master and chunkserver?

Yes. That's the first line of defence. if my main storage goes down or if I realise that I accidentally deleted an important folder and it was 6 months ago, I can go back to my local backup and retrieve it much faster than from an offsite backup. I agree I will need an offsite backup to protect myself from different types of disaster. The idea is to setup one master and separate chunkserver for each drive

ec 6.3 is pretty high. If I remember correctly, you need 9 different places to store the data. And you want that to be a little higher so you can swap out drives when you need to. If you want to keep that data online when a machine goes offline, you need those different places to be different machines as well.

yup, 9 different chunks but I would have 15 chunkservers so it should be fine and give some space for rebalancing. If by higher you mean x.4 or x.5 I don't think it is neccessary. x.3 will protect me from failure on three chunks. In case one of the chunkserver goes down, the lizard will simply try to use another one instead and rebalance. to keep data online when a machine goes offline I will need a different solution. it is like that (live storage/ local, always online) -> (backup, history to the space availability/ local almost always offline) -> of site backup

ec has an impact on CPU usage, so if you are looking at low power, go for normal goals. 2 or 3.

Yes, you need more harddrives, but ec is not exactly stable, and less cpu usage = less power consumption. Stability is a problem and the CPU requirements too as I don't plan to upgrade. the whole idea was to use EC due to overhead less than of factor 2. For factor 2 I would simply go with BTRFS

How much data are we talking about?

10TB Live - backup history as much as there is space, currently 20TB

Turning machines (harddrives) on and off every day, has an impact on their life. Are you sure you want to do this? What is your reasoning behind this?

The server is required online only for the duration of backup + in the event of recovery. So the idea is to save electricity as in other times it simply does nothing.

I am afraid of the scenario that e.g. one HDD died and I did not notice, lizard tried to rebalance but didn't have time, I kept putting new data in and then you can imagine.

-> you could use lizardfs-admin to check the status of the chunks. This can tell you what "work" the master has to do. And it can also tell you if a drive fails, or is about to fail.

You mean that I would define shutdown rule based on that? That's an excellent idea!

matthiaz commented 5 years ago

In my opinion if power consumption is a factor and HighAvailability is not:

don't use EC. Go for simple goal 2 replication. EC is CPU intensive and way risky than simple goal replication in the current lizardfs version. Remember:
- a CPU draws 40w-100w ~ 250EUR/year
- a harddrive 6w ~13EUR/year

I would seriously consider remote backup actually. Amazon glacier/Backblaze B2/ovh cloud archive. Glacier might be tricky to get data back out, I agree. But Backblaze B2 and OVH are both easy to retrieve data. Storing 20TB on OVH would cost about 50EUR/month.

Another idea that might help: You could use your clients as chunkservers as well (laptop, desktop, ...). So if you are planning on using lizardfs anyway... you could just use it as primary storage, in stead of backup. This way, the whole "loss of storage" with goal replication is less of an issue.

Eh... just brainstorming here ;-)

Good luck!

eleaner commented 5 years ago

@matthiaz

Thank you. I need a brainstorming partner :)

the instability of EC is definitely an issue and a big one. It all looks pretty stable to me but then mostly because it did not fail (yet) if I take EC out of the equation I'll stick to replication 2 and probably BTRFS raid1 instead. It ticks all the boxes except EC. or is there something I don't think about?

20TB in OHV for 50EUR/month sounds like a good option except that I get enormous (by comparison) retrieval times. Plus I cannot really find such a product. Any help here? Anyway, I need something like that for my second/third line of defence anyway. That's different brainstorming session though.

clients as the chunkservers also sound good but I'd need quite a HDD space on each considering non ECC do I understand correctly that lizardfs client mounted on the machine that is also a chunkserver will have a rather minimal network activity on retrievals? And most should be read locally?

matthiaz commented 5 years ago

20TB in OHV for 50EUR/month sounds like a good option except that I get enormous (by comparison) retrieval times.

Do you expect to do retrieval often? If it's a backup... that shouldn't happen very often right? I would worry more about how to get the data online :-)

do I understand correctly that lizardfs client mounted on the machine that is also a chunkserver will have a rather minimal network activity on retrievals? And most should be read locally?

Kind of correct. A client will first check if it has the chunk itself, if not, it will use other chunkservers. So in theory, you'd only have master communication. There was a guy that set that up in a classroom to share virtualmachines with 20+ machines. But... depending on how you set the replication, and how often your data changes, you might get a lot of network traffic writing to all chunkservers

Depending on where you are (not saying that this is the perfect solution, comparing and reviewing is on you): https://us.ovhcloud.com/products/public-cloud/archive-storage https://www.ovh.com/world/public-cloud/storage/cloud-archive/ https://www.hetzner.com/storage/storage-box https://www.backblaze.com/b2/cloud-storage-pricing.html

onlyjob commented 5 years ago

Few years ago I've evaluated Btrfs RAID and it was awful comparing to md (mdadm) RAID.

Here is a brief and incomplete list of problems I've experienced with Btrfs RAID:

No hot spare
HDD failed -- how do we know?
Replacing failed HDD: no automatic re-balance? Run "balance" manually?
RAID status (equivalent of /proc/mdstat)? != btrfs fi show but do not show HDD status (e.g. "dirty" etc.).
No partial re-balance (i.e. equivalent of mdadm bitmap?)
Device re-added (dirty): how to see dirty devices?!? After re-plugging HDD is not used but appears in btrfs fi show as nothing happened.

And there were more "fun" like failures to re-balance, segfaults, error when re-adding device, lack of equivalent of mdadm --zero-superblock and ultimately failure to mount degraded array (maybe after re-adding failed HDD but in the end no manipulations allowed me to mount Btrfs RAID again and the test data was lost).

As I vaguely recall, either I could not repair Btrfs RAID prom physical removal or HDD (which I removed to simulate failure) or maybe I could not re-add the disk...

mdraid have none of the above problems. The only thing it is lacking is data integrity checks but you can format your mdraid device with (non-raid) Btrfs.

LizardFS is superior to any RAID (except bootable ones):

you can combine disks of different size
you waste no space for unused hot spares
your data is distributed
your I/O is distributed
you can easily grow capacity
data integrity is guaranteed

And there are even more advantages...

eleaner commented 5 years ago

Thank you @onlyjob I think we briefly discussed it a while ago and you said you're using lizard in pretty impressive setups. there were a few comments recently about EC being unreliable, what's your experience?

onlyjob commented 5 years ago

Mostly positive experience, up until #746.

Of course one have to understand limitations and requirements of EC chunks. EC chunks are slow to write (#793) so they are mostly for archival purposes.

XOR chunks have reasonable performance but they are RAID-5 equivalent in terms of redundancy so they are for disposable data (e.g. trash, backups, etc.).

I've used xor3 goals for some disposable data and ec(3,2) for "write once, read many" files.

xor2 (3 chunkservers, 150%)
xor3 (4 chunkservers, 133%)
ec(2,2) (4 chunkservers, 200%) almost like goal:2
ec(3,2) (5 chunkservers 166%)

I wouldn't use higher EC goals as they require too many chunkservers (and that comes with performance penalty). I like to design for server's failure and my 150 TB cluster don't have enough nodes to effectively use goals greater than ec(3,2).

These days I'm using XOR/EC goals only for large files (to reduce number of chunks and I/O) and generally I'm moving data back to replicated goals for performance reasons.

njhurst commented 5 years ago

"XOR chunks have reasonable performance but they are RAID-5 equivalent in terms of redundancy so they are for disposable data (e.g. trash, backups, etc.)."

I'm pretty sure this is incorrect. xor chunks store three checksummed chunks per 2 chunks from the file, the third is made from the first two xorred together. If any is corrupted it is detected by the checksum and recovered from the other two. raid5 does not have checksumming, so it can't give this safety. xor should give you 1 drive or chunk corruption protection for a 50% overhead.

onlyjob commented 5 years ago

xor chunks are like RAID-5 in regards to resilience to HDD failures: you can tolerate loss of only one HDD (or chunk).

xor2 are made of three chunks (50% overhead) but xor3 use four chunks (33% overhead).

It is correct that usually RAID have no checksumming on block level hence no corruption detection and repair.

eleaner commented 5 years ago

Guys, Just an update here to close the topic since a few weeks, I am running 3.13.0-rc1 on my backup server with 10hdd of different sizes and performances formatted XFS 16GB ECC ram and Xeon E3-1265L v3

The whole LizardFS runs as a docker service one chunkserver for every hdd, fsync on single master and metadata protected with zfs mirror and snapshots additionally shadows just because I am paranoid since it is intended for archive backups I decided for EC(4,2) performance is good enough that I am trying to use S3QL for deduplication. overall getting 10-15MB/s writes through 1G network with a stable load around 10

Looks pretty stable to me. I yanked a power cable a few times to see if I am safe and I am surprised to see that not much happened, I had to fsck s3ql but Lizard did not bulge.

The only annoying thing is that the chunkservers get up very slowly during boot time. I assume it might be due to overloaded I/O from all drives being verified at the same time.

lizardfs / lizardfs

question - lizardfs as FS for home backup server #804