Closed ickc closed 4 months ago
The current storage does not support COW. Each logical file path is a physical file on one of the storage servers and requires the full amount of disk space.
Afaik, CephFS (the mountable Posix filesystem part of Ceph) doesn't support COW / reflinks either. There's a 12 year old feature request to support reflinks, but it doesn't look like it's being worked on.
Thanks, how about hard link and soft link? Also, is there any way to put a per user limit on disk quota?
I will document this in the next release.
We (DC people) should think more about disk usage policy and how to deal with duplications.
There are no hard / soft links on the current storage either. It is possible to have per directory quotas, but this will be cumbersome to add to the existing storage and enforce it.
CephFS supports hard and soft links. Please be aware that there might be performance issues (see app best practices). CephFS also uses directory quotas.
Great. Thanks.
This implies that in order to enforce per user disk quota, we need to enforce a per user directory convention, say data/$USER
or home/$USER
.
We need to give more thoughts on how to share data. May be ignore the hard/soft link there (because it requires disciplines from users anyway) and just enforce per user quota, then provide another namespace such as project
for collaboration-wide sharing with write permission to only some maintainers? (How?)
I probably will make a proposal in the form of documentation and discuss it in one of our internal weekly meetings.
As users are starting to share files with each other, we are now seeing people copying files within our VO (at
root://bohr3226.tier2.hep.manchester.ac.uk:1094//dpm/tier2.hep.manchester.ac.uk/home/souk.ac.uk/
).What are the best practice here? We want to be able to not costing n copy of files as n collaborators are making copies.
For example, when
gfal-copy
, behind the scene, would it costs 2 times as much storage? Or was it doing COW (Copy On Write) behind the scene? If not COW, is there any way we can make some sort of hard-link so that the copy is cheap, such as thecp --reflink
behavior?Would the answers to these be different once we migrate from DPM to Ceph?
Thanks.