neuropoly / gitea

https://gitea.io fork with https://git-annex.branchable.com support
https://gitea.io
MIT License
3 stars 2 forks source link

git-annex: support forking #36

Open kousu opened 1 year ago

kousu commented 1 year ago

I just forked https://data.dev.neuropoly.org/neuropoly/spine-generic-single -> https://data.dev.neuropoly.org/kousu/spine-generic-single.

Screenshot 2022-12-12 at 17-00-46 spine-generic-single

Server side, this caused a local clone:

gitea@data:~/data/gitea-repositories$ cd kousu/spine-generic-single.git/
gitea@data:~/data/gitea-repositories/kousu/spine-generic-single.git$ git remote -v
origin  /srv/gitea/data/gitea-repositories/neuropoly/spine-generic-single.git (fetch)
origin  /srv/gitea/data/gitea-repositories/neuropoly/spine-generic-single.git (push)

and per git-clone(1)

   -l, --local
      When the repository to clone from is on a local machine, this flag bypasses the normal "Git aware" transport mechanism and clones the
      repository by making a copy of HEAD and everything under objects and refs directories. The files under .git/objects/ directory are
      hardlinked to save space when possible.

      If the repository is specified as a local path (e.g., /path/to/repo), this is the default, and --local is essentially a no-op.
Evidence ``` gitea@data:~/data/gitea-repositories$ find . -links 2 -type f ./neuropoly/spine-generic-single.git/objects/62/9d830cc33fae39c6a40940a6b2ced27d6630bb ./neuropoly/spine-generic-single.git/objects/f8/b1bcb88b6dfc3503c3493ae732d38f3a37135d ./neuropoly/spine-generic-single.git/objects/f8/747259e448b3a32a6917c5d63862ff731e1059 ./neuropoly/spine-generic-single.git/objects/f8/6cd7b6e9b495c79f172e3c9996848de028e6bd ./neuropoly/spine-generic-single.git/objects/f8/ea925abf181f1d007a19c4d7655f007e78d746 ./neuropoly/spine-generic-single.git/objects/f8/64332f6c2267b256ea4448bb36b6ebe8ec11a9 ./neuropoly/spine-generic-single.git/objects/f8/a78f3f0c2ba75c2ccf0fea91a8501556e90acc ./neuropoly/spine-generic-single.git/objects/e5/ba1898582c0c46fdeffdc49fdc596093f1355e ./neuropoly/spine-generic-single.git/objects/99/c56b08b0508e0e6a9865696d152d31fa92ee17 [...] ./neuropoly/spine-generic-single.git/objects/54/f59b8b8b0a843937ac73251a9b34d432cf8ac0 ./neuropoly/spine-generic-single.git/objects/54/1e6e09c8d6b6e1e63fba519ba2d2c68a9e00a1 ./kousu/spine-generic-single.git/objects/62/9d830cc33fae39c6a40940a6b2ced27d6630bb ./kousu/spine-generic-single.git/objects/f8/b1bcb88b6dfc3503c3493ae732d38f3a37135d ./kousu/spine-generic-single.git/objects/f8/747259e448b3a32a6917c5d63862ff731e1059 ./kousu/spine-generic-single.git/objects/f8/6cd7b6e9b495c79f172e3c9996848de028e6bd ./kousu/spine-generic-single.git/objects/f8/ea925abf181f1d007a19c4d7655f007e78d746 ./kousu/spine-generic-single.git/objects/f8/64332f6c2267b256ea4448bb36b6ebe8ec11a9 ./kousu/spine-generic-single.git/objects/f8/a78f3f0c2ba75c2ccf0fea91a8501556e90acc ./kousu/spine-generic-single.git/objects/e5/ba1898582c0c46fdeffdc49fdc596093f1355e ./kousu/spine-generic-single.git/objects/99/c56b08b0508e0e6a9865696d152d31fa92ee17 ``` And to make doubly sure, here's looking one up by it's actual inode number: ``` gitea@data:~/data/gitea-repositories$ stat neuropoly/spine-generic-single.git/objects/62/9d830cc33fae39c6a40940a6b2ced27d6630bb Fichier : neuropoly/spine-generic-single.git/objects/62/9d830cc33fae39c6a40940a6b2ced27d6630bb Taille : 198 Blocs : 8 Blocs d'E/S : 4096 fichier Périphérique : fc01h/64513d Inœud : 1032761 Liens : 2 Accès : (0444/-r--r--r--) UID : ( 996/ gitea) GID : ( 996/ gitea) Accès : 2022-12-12 00:00:00.137276119 -0500 Modif. : 2022-11-30 02:24:29.875003217 -0500 Changt : 2022-12-12 16:59:12.058919207 -0500 Créé : 2022-11-30 02:24:29.875003217 -0500 gitea@data:~/data/gitea-repositories$ find . -inum 1032761 ./neuropoly/spine-generic-single.git/objects/62/9d830cc33fae39c6a40940a6b2ced27d6630bb ./kousu/spine-generic-single.git/objects/62/9d830cc33fae39c6a40940a6b2ced27d6630bb ```

But the repo sizes are wildly different: ~885MB vs ~1.5MB:

Screenshot 2022-12-12 at 17-21-56 spine-generic-single

Screenshot 2022-12-12 at 17-21-41 spine-generic-single

And this is of course because it didn't clone the annex files:

gitea@data:~/data/gitea-repositories$ ls kousu/spine-generic-single.git/annex
ls: impossible d'accéder à 'kousu/spine-generic-single.git/annex': Aucun fichier ou dossier de ce type

and of course this means the repo is broken

``` p115628@joplin:~/src/neurogitea/test$ git clone https://data.dev.neuropoly.org/kousu/spine-generic-single spine-generic-single-fork Clonage dans 'spine-generic-single-fork'... remote: Enumerating objects: 3703, done. remote: Counting objects: 100% (3703/3703), done. remote: Compressing objects: 100% (1255/1255), done. remote: Total 3703 (delta 2015), reused 2942 (delta 1550), pack-reused 0 Réception d'objets: 100% (3703/3703), 338.08 Kio | 9.39 Mio/s, fait. Résolution des deltas: 100% (2015/2015), fait. p115628@joplin:~/src/neurogitea/test$ cd spine-generic-single-fork/ p115628@joplin:~/src/neurogitea/test/spine-generic-single-fork$ git annex get (merging origin/git-annex origin/synced/git-annex into git-annex...) (recording state in git...) (scanning for unlocked files...) get derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz (not available) Maybe add some of these git remotes (git remote add ...): 5c733c49-b0a9-4d18-989a-11829918dcc1 -- gitea@data.dev.neuropoly.org:/srv/gitea/data/gitea-repositories/neuropoly/spine-generic-single.git failed get derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (not available) Maybe add some of these git remotes (git remote add ...): 5c733c49-b0a9-4d18-989a-11829918dcc1 -- gitea@data.dev.neuropoly.org:/srv/gitea/data/gitea-repositories/neuropoly/spine-generic-single.git failed get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (not available) ``` same for ssh: ``` p115628@joplin:~/src/neurogitea/test$ git clone gitea@data.dev.neuropoly.org:kousu/spine-generic-single.git spine-generic-single-fork Clonage dans 'spine-generic-single-fork'... remote: Énumération des objets: 3703, fait. remote: Décompte des objets: 100% (3703/3703), fait. remote: Compression des objets: 100% (1255/1255), fait. remote: Total 3703 (delta 2015), réutilisés 2942 (delta 1550), réutilisés du pack 0 Réception d'objets: 100% (3703/3703), 338.08 Kio | 9.39 Mio/s, fait. Résolution des deltas: 100% (2015/2015), fait. p115628@joplin:~/src/neurogitea/test$ cd spine-generic-single-fork/ p115628@joplin:~/src/neurogitea/test/spine-generic-single-fork$ git annex get (merging origin/git-annex origin/synced/git-annex into git-annex...) (recording state in git...) (scanning for unlocked files...) get derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz (not available) Maybe add some of these git remotes (git remote add ...): 5c733c49-b0a9-4d18-989a-11829918dcc1 -- gitea@data.dev.neuropoly.org:/srv/gitea/data/gitea-repositories/neuropoly/spine-generic-single.git failed get derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (not available) Maybe add some of these git remotes (git remote add ...): 5c733c49-b0a9-4d18-989a-11829918dcc1 -- gitea@data.dev.neuropoly.org:/srv/gitea/data/gitea-repositories/neuropoly/spine-generic-single.git failed get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (not available) Maybe add some of these git remotes (git remote add ...): ```

But if I run git annex get inside the remote repo

``` gitea@data:~/data/gitea-repositories/kousu/spine-generic-single.git$ git annex get (recording state in git...) get SHA256E-s896332--71a1699d1944f4817f8aaf0d0d36660576649eeaafd56273f67437855135d3d1.nii.gz (from origin...) ok get SHA256E-s2101125--c07a5070d63235cd576195a5a3580152dd079e4399e18d4b74e5efba4cceef83.nii.gz (from origin...) ok get SHA256E-s1755316--3564eb18fc031d066a4c3f2956a40ffa60a8b4d12b8a5cdbc2f24eb5d7b92e3c.nii.gz (from origin...) ok get SHA256E-s8190151--594c0a052fae3ee009212af444420398ba9874502dec4ec23d96157bff7eeed2.nii.gz (from origin...) ok get SHA256E-s1756168--2fef600a9ddee9cacdf83d94068b786d213f0b598b0ada4417da0416e078b15c.nii.gz (from origin...) ok get SHA256E-s1455109--edc02370aaef945de7e3a13fe0e975a7fbb01af76c5ccfb69cda44f0a24e2bf7.nii.gz (from origin...) [..] get SHA256E-s3350533--aae0efb7544e05e33bde3d8fd3b633a7a41eee629bc7c231d4016bc7cd09670b.nii.gz (from origin...) ok get SHA256E-s1824049--5472fd5f7ca43b8d2c3b35ca210ccaf5373f709cdf4b08845df8b221ba0c025b.nii.gz (from origin...) ok get SHA256E-s1180152--efdf45e83f7548c1632214c8a8332db44eed4f1581523e3142d7f180bb6762cd.nii.gz (from origin...) ok get SHA256E-s1082687--290a43b80da6f608e3d47107f3b6c05e98eebe56ed4eea633748c08bd1a7837a.nii.gz (from origin...) ok (recording state in git...) ```

Then it works

``` p115628@joplin:~/src/neurogitea/test$ git clone gitea@data.dev.neuropoly.org:kousu/spine-generic-single.git spine-generic-single-fork Clonage dans 'spine-generic-single-fork'... remote: Énumération des objets: 4134, fait. remote: Décompte des objets: 100% (4134/4134), fait. remote: Compression des objets: 100% (1544/1544), fait. remote: Total 4134 (delta 2296), réutilisés 2943 (delta 1550), réutilisés du pack 0 Réception d'objets: 100% (4134/4134), 360.49 Kio | 6.21 Mio/s, fait. Résolution des deltas: 100% (2296/2296), fait. p115628@joplin:~/src/neurogitea/test$ cd spine-generic-single-fork/ p115628@joplin:~/src/neurogitea/test/spine-generic-single-fork$ git annex get (merging origin/git-annex origin/synced/git-annex into git-annex...) (recording state in git...) (scanning for unlocked files...) get derivatives/labels/sub-douglas/anat/sub-douglas_T1w_RPI_r_labels-manual.nii.gz (from origin...) ok get derivatives/labels/sub-juntendoAchieva/dwi/sub-juntendoAchieva_dwi_moco_dwi_mean_seg-manual.nii.gz (from origin...) ok get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_labels-manual.nii.gz (from origin...) ok get derivatives/labels/sub-oxfordFmrib/anat/sub-oxfordFmrib_T1w_RPI_r_seg-manual.nii.gz (from origin...) ok get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_labels-manual.nii.gz (from origin...) ok get derivatives/labels/sub-perform/anat/sub-perform_T1w_RPI_r_seg-manual.nii.gz (from origin...) ok get derivatives/labels/sub-perform/dwi/sub-perform_dwi_moco_dwi_mean_seg-manual.nii.gz (from origin...) [...] ```

So, we need to add calling git annex get to the Gitea "Fork" button -- but only in git-annex repos, of course.

However, if we can, we should try to use hardlinks the way git clone does, as the git annex get I ran above actually made copies

``` gitea@data:~/data/gitea-repositories$ du -hs kousu/spine-generic-single.git/ neuropoly/spine-generic-single.git/ 886M kousu/spine-generic-single.git/ 882M neuropoly/spine-generic-single.git/ ```
kousu commented 1 year ago

The key seems to be annex.hardlink. I deleted and reforked the repo, then

gitea@data:~/data/gitea-repositories$ git config --global  annex.hardlink true

Then copying the annex files was much faster

``` gitea@data:~/data/gitea-repositories$ cd kousu/spine-generic-single.git/ gitea@data:~/data/gitea-repositories/kousu/spine-generic-single.git$ git annex get get SHA256E-s896332--71a1699d1944f4817f8aaf0d0d36660576649eeaafd56273f67437855135d3d1.nii.gz (from origin...) ok get SHA256E-s2101125--c07a5070d63235cd576195a5a3580152dd079e4399e18d4b74e5efba4cceef83.nii.gz (from origin...) ok [...] get SHA256E-s1082687--290a43b80da6f608e3d47107f3b6c05e98eebe56ed4eea633748c08bd1a7837a.nii.gz (from origin...) ok (recording state in git...) git-annex: get: 12 failed ```

And the counts come out showing they are indeed now avoiding the duplication:

gitea@data:~/data/gitea-repositories$ du -hs  kousu/spine-generic-single.git/  neuropoly/spine-generic-single.git/
886M    kousu/spine-generic-single.git/
2,9M    neuropoly/spine-generic-single.git/
gitea@data:~/data/gitea-repositories$ # but counting them separately shows them as full sized
gitea@data:~/data/gitea-repositories$ du -hs  kousu/spine-generic-single.git/; du -hs  neuropoly/spine-generic-single.git/
886M    kousu/spine-generic-single.git/
885M    neuropoly/spine-generic-single.git/
gitea@data:~/data/gitea-repositories$ 

The git-annex manpage says

          When a repository is set up using git clone --shared, git-annex init will automatically set annex.hardlink and mark the repository as untrusted.

which I guess means gitea is not doing git clone --shared. Perhaps a pity? But probably not something we can risk changing.

It also warns

          Use  with  caution  --  This can invalidate numcopies counting, since with hard links, fewer copies of a file can exist. So, it is a good idea to mark a repository using this
         setting as untrusted.

but I think that's just..a standard assumption we always have to live with (git-annex makes a lot of design choices and assumptions that aren't actually enforceable in like, physical reality, where entropy exists.)

Note: this triggered #32, in a different way than before, because the git annex get was run after the repo size had been cached. But as in #32 a single git push was enough to trigger the size recomputation:

Screenshot 2022-12-12 at 17-41-55 spine-generic-single

kousu commented 1 year ago

tl;dr: