Ability to run on CoW filesystems?

divanikus commented 5 years ago

Currently Ambry strictly needs fallocate to create partition files. But CoW fses like btrfs or zfs do not support it. Is there any way to run on top of CoW fses? I'm aware of the design goal behind the fallocate usage.

siepkes commented 5 years ago

Have you tried it? As far as I know if there is no fallocate support the disk segments simply don't get preallocated but Ambry will still work.

divanikus commented 5 years ago

Well, the process starts, but with the following error.

[2018-11-02 19:18:41,034] ERROR (com.github.ambry.store.DiskManager) Exception while starting store for the partitionPartition[0]
com.github.ambry.store.StoreException: Error while starting store for dir /var/lib/ambry/0/0
        at com.github.ambry.store.BlobStore.start(BlobStore.java:230)
        at com.github.ambry.store.DiskManager.lambda$start$0(DiskManager.java:125)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: error while trying to preallocate file /var/lib/ambry/0/0/log_current exitvalue 1 error string fallocate: fallocate failed: keep size mode is unsupported

divanikus commented 5 years ago

Any POST requests fail with

[2018-11-06 19:03:52,744] ERROR (com.github.ambry.replication.ReplicationManager) Not replicating to partition Partition[0] because an initialized store could not be found
[2018-11-06 19:03:52,744] WARN (com.github.ambry.replication.ReplicationManager) Number of Datacenters to replicate from is 0, not starting any replica threads
[2018-11-06 19:12:38,774] ERROR (com.github.ambry.server.AmbryRequests) Validating put request failed with error Disk_Unavailable for request ReceivedPutRequest[BlobID=AAYQAf__AAAAAQAAAAAAAAAAFjUrGK3zReGHUl3z1Onksw, PartitionId=Partition[0], ClientId=am-00, CorrelationId=2, BlobProperties[BlobSize=34144, ContentType=image/gif, OwnerId=root, ServiceId=CUrlUpload, IsPrivate=false, CreationTimeInMs=1541520757813, TimeToLiveInSeconds=Infinite, AccountId=-1, ContainerId=0, IsEncrypted=false], UserMetaDataSize=56, blobType=DataBlob, blobSize=34144, crc=4239089245, BlobKeyAvailable=false]
[2018-11-06 19:12:49,700] ERROR (com.github.ambry.server.AmbryRequests) Validating put request failed with error Disk_Unavailable for request ReceivedPutRequest[BlobID=AAYQAf__AAAAAQAAAAAAAAAA_YKCHpIXSqSM5IV_hgC2Ng, PartitionId=Partition[0], ClientId=am-00, CorrelationId=4, BlobProperties[BlobSize=34144, ContentType=image/gif, OwnerId=root, ServiceId=CUrlUpload, IsPrivate=false, CreationTimeInMs=1541520768696, TimeToLiveInSeconds=Infinite, AccountId=-1, ContainerId=0, IsEncrypted=false], UserMetaDataSize=56, blobType=DataBlob, blobSize=34144, crc=1053495434, BlobKeyAvailable=false]

cgtz commented 5 years ago

Hi @divanikus, you are correct that the preallocateIfNeeded method makes an assumption that the keep size flag is supported on any linux file system. For non linux systems we do not attempt to preallocate files. Perhaps we can make this method more flexible and introduce a config to swallow fallocate errors.

divanikus commented 5 years ago

Hi @cgtz, any chance I can overcome it now?

cgtz commented 5 years ago

If you have the project checked out and want to do some quick experimentation, you can comment out these lines: https://github.com/linkedin/ambry/blob/master/ambry-utils/src/main/java/com.github.ambry.utils/Utils.java#L667-L679

I will try to put up a PR soon to make this configurable though.

divanikus commented 5 years ago

@cgtz Yup, commenting out helps, at least it now starts and is able to execute requests. I wonder how large is the impact of disabling that code? I mean, won't it break somewhere later (during compaction etc) or something like that?

Just a few words about my case. I have several big servers with 10x10TB drives. Currently they are using ZFS on Linux with RAIDZ2 for resilience. I have pretty solid experience with ZFS in terms of speed and reliability. Of course it would cost some performance. As far as I understand from the projects docs, Ambry is designed to be run on top of the bare metal HDDs without any RAID-level resiliency. Also, ZFS is a CoW filesystem, like Linux native BTRFS, which makes pre-allocation kind of useless - FS allocates new block on each write instead of reusing existing one. BTRFS allows to disable CoW for subvolumes, ZFS don't. I can use ZVol and format it with EXT4, but I don't think that layering too many abstractions would result in any good performance.

kvtb commented 1 year ago

Actually, ZFS gives the possibility to reduce number of abstraction layers: POSIX can be kicked off, if Ambry would interact directly with DMU.

linkedin / ambry

Ability to run on CoW filesystems? #1097