removing files from datalad/annex does not remove them from osf

neuronets / trained-models

Trained TensorFlow models for 3D image processing

https://neuronets.dev/trained-models

22 stars 15 forks source link

removing files from datalad/annex does not remove them from osf #100

Closed hvgazula closed 9 months ago

hvgazula commented 1 year ago

https://github.com/datalad/datalad-osf/issues/186

I reckon this could be important because, the workflow will accumulate unnecessary files even in the case of failure while adding models.

satra commented 1 year ago

isn't that the point of git to have those changes captured? and more importantly the point of quality control/checking/revisions before a merge is made into the main branch. when we choose to add a model that should be a specific set of weights. also the only bulk element of this are the weights. the rest should be under git control not annex control. git/git-annex allows separation of which files go into annex and which into git. there is a configuration setting with known text-type files going into git. and for annex, or git for that matter, the only thing that makes unnecessary files go in is if people mistakenly add model weights that should not be there. personally, at this stage this seems like a 20% problem, and also because repositories can be pruned later if needed.

hvgazula commented 1 year ago

Satra, you are correct. We have to modify the workflow to push to storage after testing unlike the current approach where we are pushing first to storage and then testing. 🤦‍♂️ Please don't ask me why we did that :).

hvgazula commented 1 year ago

And regarding 'pruning', I was playing around a bit and was curious how to remove references from the storage well. I mean, of course we can remove them from github but I noticed the files still remain in the storage and hence the note.

hvgazula commented 1 year ago

Satra, you are correct. We have to modify the workflow to push to storage after testing unlike the current approach where we are pushing first to storage and then testing. 🤦‍♂️ Please don't ask me why we did that :).

Ah..I know why we did that. We were trying to accomplish as much as possible on the github runner to save time (and dollars 😋) spent on the aws runner.

hvgazula commented 9 months ago