Closed NathanMolinier closed 5 months ago
Strangely, when I git clone the repos, I see the size is 1GB without running the git-annex get:
julien-macbook:~/data.neuro $ git clone git@data.neuro.polymtl.ca:datasets/lumbar-vanderbilt
Cloning into 'lumbar-vanderbilt'...
remote: Énumération des objets: 1866, fait.
remote: Décompte des objets: 100% (1866/1866), fait.
remote: Compression des objets: 100% (1016/1016), fait.
remote: Total 1866 (delta 379), réutilisés 1843 (delta 374), réutilisés du pack 0
Receiving objects: 100% (1866/1866), 1.09 GiB | 58.13 MiB/s, done.
Resolving deltas: 100% (379/379), done.
Is that normal?
Is that normal?
With Mathieu we fixed the issue. I just forgot to call git annex before adding the data. The branch nm/first-commit
was therefore deleted.
A new branch nm/first-commit-2
should now be available. The data was also updated to follow our new conventions. This branch is now ready to merge.
I just forgot to call git annex before adding the data.
it's not the first time this happens, and it will likely happen again in the future. I'm wondering if there is any check we can do to monitor when this happens, eg: a cron job running on all the 'dataset' git repos and that checks if binaries are physically present in the .git instead of the .gitannex folder, or something like that? @mguaypaq @kousu
datalad's default configuration annexes every single file; if you configure it to split them up with .gitattributes like we do (and like only makes sense to do) they have this problem too. It's a basic result of stacking too many layers in one tool. We should reconsider #68
In the meantime we can write some fscking scripts, ti catch these issues, and yes we should do that, but they would be a stop gap. Still, useful to get the alert at least! Maybe that's something @namgo has capacity for, come to think of it.
Possibly I can add a git hook to the new repository template, that would refuse pushes with non-annexed files?
You might struggle because hooks are per instance of a git repo, they don't get cloned, and if we ever use Gitea hooks don't get copied when a repo is forked. Would you just patch it into that shell script we copy-paste for people?
I'm willing to re-fix the problem differently once Gitea is up and running. And I need to do manual intervention right now for every new gitolite repository, so if hooks work then that's fine.
Also, I think Gitea templates allow copying hooks? I haven't looked into it, but I remember that there were a bunch of checkboxes of "what to copy" when I used a new repo template on spineimage.ca.
Ah true. Okay sounds good!
Back to the main topic for this issue: the branch nm/first-commit-2
looks good from my end:
Bids-validator complains about INCONSISTENT_SUBJECTS and INCONSISTENT_PARAMETERS, so I added an extra commit with this .bids-validator-config.json
:
{"ignore": ["INCONSISTENT_SUBJECTS", "INCONSISTENT_PARAMETERS"]}
Some subjects have only axial scans, some subjects have only sagittal scans, and some subjects have both, so INCONSISTENT_SUBJECTS is expected.
All the axial scans have dimensions (512, 512, 14). Most of the sagittal scans have dimensions (560, 1558, 1), but a few are different, causing INCONSISTENT_PARAMETERS. But at least, the corresponding label files have matching dimensions.
So, I merged into master
and deleted the branch nm/first-commit-2
.
Description
I just pushed a new dataset
lumbar-vanderbilt
on our git-annex server data. The new data is on a branch callednm/first-commit
.This dataset was shared by one collaborator in the context of gray matter segmentation for the lumbar region. This dataset contains:
Before merging, I believe we should wait for possible changes related to our data curation convention.