Closed hvgazula closed 9 months ago
isn't that the point of git to have those changes captured? and more importantly the point of quality control/checking/revisions before a merge is made into the main branch. when we choose to add a model that should be a specific set of weights. also the only bulk element of this are the weights. the rest should be under git control not annex control. git/git-annex allows separation of which files go into annex and which into git. there is a configuration setting with known text-type files going into git. and for annex, or git for that matter, the only thing that makes unnecessary files go in is if people mistakenly add model weights that should not be there. personally, at this stage this seems like a 20% problem, and also because repositories can be pruned later if needed.
Satra, you are correct. We have to modify the workflow to push to storage after testing unlike the current approach where we are pushing first to storage and then testing. 🤦♂️ Please don't ask me why we did that :).
And regarding 'pruning', I was playing around a bit and was curious how to remove references from the storage well. I mean, of course we can remove them from github but I noticed the files still remain in the storage and hence the note.
Satra, you are correct. We have to modify the workflow to push to storage after testing unlike the current approach where we are pushing first to storage and then testing. 🤦♂️ Please don't ask me why we did that :).
Ah..I know why we did that. We were trying to accomplish as much as possible on the github runner to save time (and dollars 😋) spent on the aws runner.
https://github.com/datalad/datalad-osf/issues/186
I reckon this could be important because, the workflow will accumulate unnecessary files even in the case of failure while adding models.