Open kousu opened 1 year ago
The key seems to be annex.hardlink
. I deleted and reforked the repo, then
gitea@data:~/data/gitea-repositories$ git config --global annex.hardlink true
Then copying the annex files was much faster
And the counts come out showing they are indeed now avoiding the duplication:
gitea@data:~/data/gitea-repositories$ du -hs kousu/spine-generic-single.git/ neuropoly/spine-generic-single.git/
886M kousu/spine-generic-single.git/
2,9M neuropoly/spine-generic-single.git/
gitea@data:~/data/gitea-repositories$ # but counting them separately shows them as full sized
gitea@data:~/data/gitea-repositories$ du -hs kousu/spine-generic-single.git/; du -hs neuropoly/spine-generic-single.git/
886M kousu/spine-generic-single.git/
885M neuropoly/spine-generic-single.git/
gitea@data:~/data/gitea-repositories$
The git-annex manpage says
When a repository is set up using git clone --shared, git-annex init will automatically set annex.hardlink and mark the repository as untrusted.
which I guess means gitea is not doing git clone --shared
. Perhaps a pity? But probably not something we can risk changing.
It also warns
Use with caution -- This can invalidate numcopies counting, since with hard links, fewer copies of a file can exist. So, it is a good idea to mark a repository using this setting as untrusted.
but I think that's just..a standard assumption we always have to live with (git-annex makes a lot of design choices and assumptions that aren't actually enforceable in like, physical reality, where entropy exists.)
Note: this triggered #32, in a different way than before, because the git annex get
was run after the repo size had been cached. But as in #32 a single git push
was enough to trigger the size recomputation:
tl;dr:
git config annex.hardlink true
, either in all repos it creates, or in --global
(I'm unsure which is better)git annex get
to the internal fork process
I just forked https://data.dev.neuropoly.org/neuropoly/spine-generic-single -> https://data.dev.neuropoly.org/kousu/spine-generic-single.
Server side, this caused a local clone:
and per git-clone(1)
Evidence
``` gitea@data:~/data/gitea-repositories$ find . -links 2 -type f ./neuropoly/spine-generic-single.git/objects/62/9d830cc33fae39c6a40940a6b2ced27d6630bb ./neuropoly/spine-generic-single.git/objects/f8/b1bcb88b6dfc3503c3493ae732d38f3a37135d ./neuropoly/spine-generic-single.git/objects/f8/747259e448b3a32a6917c5d63862ff731e1059 ./neuropoly/spine-generic-single.git/objects/f8/6cd7b6e9b495c79f172e3c9996848de028e6bd ./neuropoly/spine-generic-single.git/objects/f8/ea925abf181f1d007a19c4d7655f007e78d746 ./neuropoly/spine-generic-single.git/objects/f8/64332f6c2267b256ea4448bb36b6ebe8ec11a9 ./neuropoly/spine-generic-single.git/objects/f8/a78f3f0c2ba75c2ccf0fea91a8501556e90acc ./neuropoly/spine-generic-single.git/objects/e5/ba1898582c0c46fdeffdc49fdc596093f1355e ./neuropoly/spine-generic-single.git/objects/99/c56b08b0508e0e6a9865696d152d31fa92ee17 [...] ./neuropoly/spine-generic-single.git/objects/54/f59b8b8b0a843937ac73251a9b34d432cf8ac0 ./neuropoly/spine-generic-single.git/objects/54/1e6e09c8d6b6e1e63fba519ba2d2c68a9e00a1 ./kousu/spine-generic-single.git/objects/62/9d830cc33fae39c6a40940a6b2ced27d6630bb ./kousu/spine-generic-single.git/objects/f8/b1bcb88b6dfc3503c3493ae732d38f3a37135d ./kousu/spine-generic-single.git/objects/f8/747259e448b3a32a6917c5d63862ff731e1059 ./kousu/spine-generic-single.git/objects/f8/6cd7b6e9b495c79f172e3c9996848de028e6bd ./kousu/spine-generic-single.git/objects/f8/ea925abf181f1d007a19c4d7655f007e78d746 ./kousu/spine-generic-single.git/objects/f8/64332f6c2267b256ea4448bb36b6ebe8ec11a9 ./kousu/spine-generic-single.git/objects/f8/a78f3f0c2ba75c2ccf0fea91a8501556e90acc ./kousu/spine-generic-single.git/objects/e5/ba1898582c0c46fdeffdc49fdc596093f1355e ./kousu/spine-generic-single.git/objects/99/c56b08b0508e0e6a9865696d152d31fa92ee17 ``` And to make doubly sure, here's looking one up by it's actual inode number: ``` gitea@data:~/data/gitea-repositories$ stat neuropoly/spine-generic-single.git/objects/62/9d830cc33fae39c6a40940a6b2ced27d6630bb Fichier : neuropoly/spine-generic-single.git/objects/62/9d830cc33fae39c6a40940a6b2ced27d6630bb Taille : 198 Blocs : 8 Blocs d'E/S : 4096 fichier Périphérique : fc01h/64513d Inœud : 1032761 Liens : 2 Accès : (0444/-r--r--r--) UID : ( 996/ gitea) GID : ( 996/ gitea) Accès : 2022-12-12 00:00:00.137276119 -0500 Modif. : 2022-11-30 02:24:29.875003217 -0500 Changt : 2022-12-12 16:59:12.058919207 -0500 Créé : 2022-11-30 02:24:29.875003217 -0500 gitea@data:~/data/gitea-repositories$ find . -inum 1032761 ./neuropoly/spine-generic-single.git/objects/62/9d830cc33fae39c6a40940a6b2ced27d6630bb ./kousu/spine-generic-single.git/objects/62/9d830cc33fae39c6a40940a6b2ced27d6630bb ```But the repo sizes are wildly different: ~885MB vs ~1.5MB:
And this is of course because it didn't clone the annex files:
and of course this means the repo is broken
But if I run
git annex get
inside the remote repoThen it works
So, we need to add calling
git annex get
to the Gitea "Fork" button -- but only in git-annex repos, of course.However, if we can, we should try to use hardlinks the way
git clone
does, as thegit annex get
I ran above actually made copies