singularityhub / singularity-deploy

Build and deploy Singularity containers to GitHub releases, and pull with the singularity-hpc client
Mozilla Public License 2.0
16 stars 13 forks source link

Question: Tags vs Release version #4

Open fenz opened 3 years ago

fenz commented 3 years ago

I'm trying to understand how to use the tagging in the proper way. As far as I got, an image tag for a container is in the name (in Singularity.pokemon should be pokemon) and the "release version" can be considered as a snapshot/SHA since you can update a tag (pokemon) and still want to have the possibility to use the previous "SHA". I have to understand how to use this since I'm still a bit confused but I got even more confused when I looked at a shpc recipe: https://github.com/singularityhub/singularity-hpc/blob/main/registry/singularityhub/singularity-deploy/container.yaml There it seems "tags" and "release versions" are used interchangeably ("salad" vs "0.0.12"):

latest:
  0.0.12: singularityhub/singularity-deploy
tags:
  0.0.12: singularityhub/singularity-deploy
  salad: 0.0.1

I know all of this is just a "template" which can be customized but I was wondering if you can try to clarify the difference between "tags" and "release versions" and how you suggest to use them.

vsoch commented 3 years ago

For Singularity deploy, a tag is a Singularity file extension that is then present in the container name (e.g Singularity.salad). And the digest is the GitHub release. Sorry for the confusion - the file you linked has a bug introduced by the auto update script! @alecbcs we will need to take a look! You can look at previous versions of the file to see the correct way to do it.

fenz commented 3 years ago

So I somehow got it right. Mentioning the GitHub release as "digest" makes it more clear in my opinion. Now I was looking for a way to point to the "latest" digest of a certain tag but it seems not possible. That's anyway a different discussion. Thanks for the clarification.

vsoch commented 3 years ago

I think you're right - would you like to submit a PR to update the description in the README?

For latest, that's a hard question - it would have to be the tag associated with Singularity.latest. I completely made this up and it's still in more of a "proof of concept" stage, so if you have ideas for how we could do this, I will gladly update shpc to support it.

alecbcs commented 3 years ago

Oh interesting! This definitely comes down to a misunderstanding I had about singularity-deploy when writing Binoc's SHPC parser. I'd figured that tags would comes from the the git tags/releases rather than the file names. Since you can also make a git tag a moving target (although I wouldn't recommend it under most circumstances) have we thought about someone creating releases called latest and develop under GitHub and periodically moving the release pointers? For example they would re-release latest pointing at v0.0.1 and then later v0.0.2. Or does this not follow the Spec of Singularity deploy and thus we shouldn't worry about it?

@vsoch that was fast getting autamus/binoc#7 submitted!

vsoch commented 3 years ago

@alecbcs I do think we would want to have a clean strategy to deploy consistent (tags?) for latest and develop (or other) - are you thinking that we should refactor for a release to be a tag (e.g., 0.0.1, latest, or develop) and then within that tag, perhaps there would just be a known set of associated files? How would we represent the digest? Or would there be no digest, just the tag?

I'd like to improve how this is currently designed, so let's chat about how we can do that.

alecbcs commented 3 years ago

@vsoch hypothetically if we went the way of releases as tags that might be beneficial to encourage people to make good use of tags as versions rather than these moving targets we've had a hard time dealing with.

I think this might also be more in line with what we're doing when updating docker containers to versioned releases in shpc. Right now in shpc we're versioning containers tags like this,

docker: ghcr.io/autamus/abyss
url: https://github.com/orgs/autamus/packages/container/package/abyss
maintainer: '@vsoch'
description: ABySS is a de novo, parallel, paired-end sequence assembler that is designed
  for short reads.
latest:
  2.3.1: sha256:92b2975865998a0560f66d2120d224822e8e1ab350b606a0e308e1fbbbaecd9d
tags:
  2.3.0: sha256:92b2975865998a0560f66d2120d224822e8e1ab350b606a0e308e1fbbbaecd9d
  2.3.1: sha256:92b2975865998a0560f66d2120d224822e8e1ab350b606a0e308e1fbbbaecd9d
  latest: sha256:92b2975865998a0560f66d2120d224822e8e1ab350b606a0e308e1fbbbaecd9d

Which I believe is closer to using the versions as tags rather than digests? Basing the tags off of releases might also help make the container.yaml filters in shpc specs more applicable to singularity deploy by ignoring any releases that contain something like -rc1, -dev, etc....

As for what to do with the digests they could be the name of the file in the release pack?

This might complicate other parts of shpc or singularity deploy or might be that it isn't as intuitive for users. You've been in the singularity world much longer than I have so these are just some thoughts from my package management experience with Autamus.

vsoch commented 3 years ago

Nope i think this is spot on, and more intuitive than what we currently have. I don't think I have enough time today to do all the changes to singularity-deploy, shpc, and then binoc, but I can do this next weekend. Let's keep this issue open to work on it, and I'll ping here when I've made some progress!

vsoch commented 3 years ago

And @fenz thank you for opening this issue - I'll have it fixed up hopefully next weekend so using singularity-deploy is easier and more intuitive. Stay tuned!

fenz commented 3 years ago

Thanks for updating this. Indeed since "GitHub release" are also "GitHub tags" it starts to be confusing to look at them in a different way (compared to tag in container images). Moreover, I would rather prefer to have a list of releases for an image and not adding a file in the repo for each new release. This would also allow to manage the build in a different way since I think at the moment all the recipes are build when you do a release, no matter if you changed or not a certain recipe file. Are you also thinking about keeping the different versions in different branches? So you could also be considering the "main" as just the "Bleeding edge" and all the "stable" releases are in a branch following a specific naming convention like "release/" (or anything customizable with a workflow parameter). I'm not a great git expert so I'm not saying that's the best approach but I feel that's something worth to investigate.

vsoch commented 3 years ago

One issue I'm thinking about is that, given that we use GitHub releases as tags, that means you can only associate one image with a release tag. It's probably akin to an issue we haven't addressed yet with architecture - there can be multiple digests for the same tag on Docker Hub or elsewhere for different architectures, and we currently just provide one. So if it's the case that one release == one container, and the release tag (e.g., 0.0.1) is associated with an exact (single) file that we could derive by way of parsing the release artifacts, I think this could work. But if we require multiple files per release tag, OR if the bot binoc cannot discover the name of the tag, then we have an issue. For shpc pull, the name of the binary in the release also needs to be predictable, so I am not sure doing something like a hash is an option.

Kind of struggling to figure out how to implement this - I made my current choices to use a tag as a Singularity. prefix and a digest as the GitHub release because of this exact need to support multiple tags (file extensions) per release cycle (GitHub tag).

fenz commented 3 years ago

Sorry for late reply. I'm not expert of building images for different architectures but would this not require a change/option add to the build command executed in the workflow? Building for different architecture or even different base OS image (ex. ubuntu vs. centos) can possibly be done creating different recipes and generate different files in a specific release, but still I would threat release like a tag for the image. Maybe you can add a "suffix" based on the name of the recipe: github/myapp release 0.0.1

In this case each release I will generate different "digests" for different recipes but the app tag/version will be still based on the release. This could be also managed at branch level, so merge a PR to a "release/0.0.1" branch will generate the release "0.0.1" (but VERSION file can work as well). I could be wrong but I feel this more aligned with a standard "release" concept where you can generate .tar/.zip/x86/arm/etc but all those file are part of a release X of a specific app. In your example, the app "singularityhub-singularity-deploy", version "0.0.12", architecture/baseOS "pokemon" will be:

https://github.com/singularityhub/singularity-deploy/releases/download/0.0.12/singularityhub-singularity-deploy.pokemon.sif

I guess there's still the issue with defining a digest, I'm not sure if the "suffix" is enough as "digest". In this case you need also to define a "default/amd64" digest (in case of Singularity recipe with no suffix).

vsoch commented 3 years ago

@fenz if the tag in the above is 0.0.12 (the GitHub release) how would the image name be predicted / discovered?

vsoch commented 3 years ago

@alecbcs what about... if we can get the commit for a release (e.g., https://stackoverflow.com/questions/56724610/how-to-get-a-commit-sha-from-a-release-or-tag-on-github-api-v3/56753582) then we could possibly look at files: https://docs.github.com/en/rest/reference/repos#get-a-commit and then look for Singularity.<*> recipes. Of course if a release has more than one commit we would need to go back until the previous release (and stop before we parse it). That's how Singularity Hub used to derive changed recipes for a repo!

Of course that assumes that we only want to release changed recipes, which might not be the case. I guess the simple and silly approach (which is reasonable for a GitHub workflow) would be to clone the repository in a temporary directory and look for the Singularity. files. Is that too much?

The issue is that the shpc "pull" client needs to also have a way to predict the URI of a container. I suppose if the name and tag (e.g., singularity-deploy-pokemon.sif) is in the name, they would just need to ask for it directly. And then maybe binoc can do the magic to discover new recipes? Let me know what you think!

fenz commented 3 years ago

@vsoch I'm not getting the "predictability" question. I think I'm missing some bit here regarding the whole process/need for both the workflow and shpc. So forgive me if I say something stupid but I thought if you look for a version X of an application Y you find it at: REPO_URL/releases/download/X/Y.default.sif and this correspond to the image: Y:X-default (in a tag syntax). By the way, as said I'm probably not getting the issue so I'll let you discuss and find a smart solution ;)

vsoch commented 3 years ago

@fenz I can do a better job of describing the issue! So binoc is the bot that looks for updates, and when we update from Docker Hub we can easily discover both tags and digests from the API calls. For GitHub release artifacts, we can of course easily discover new releases (e.g., 0.0.12) but it's not clear to me how we then discover the recipes. If we can get the artifact names from the release, it could be a simple matter of just parsing those into strings that we call digests. But then we run into another issue - if a GitHub release 0.0.12 has three different associated sif binaries, which one is correct to call the digest? That's why I originally reversed it - find the tags first, and then use the GitHub release as the digest. E.g.,:

salad: 0.0.12 pokemon: 0.0.12

Because we might have the same Sif filenames (e.g., salad) over different GitHub releases (0.0.12) and we can represent multiple files, but we couldn't do that if 0.0.12 is the tag. We could only choose one. E.g., a sif for salad, pokemon, and latest can't ALL be associated with the 0.0.12 tag.

fenz commented 3 years ago

Ok. Let me see if I got it right. Binoc: https://github.com/autamus/binoc, looks for the update in a certain git repo in order to update shpc collection: https://singularityhub.github.io/singularity-hpc . Now, in case of the current repo you have this entry: https://github.com/singularityhub/singularity-hpc/blob/main/registry/singularityhub/singularity-deploy/container.yaml

gh: singularityhub/singularity-deploy
url: https://github.com/singularityhub/singularity-deploy
maintainer: '@vsoch'
description: Example shpc container using Singularity Deploy, build and serve from
  GitHub releases.
latest:
  salad: 0.0.12
tags:
  salad: 0.0.12
aliases:
  salad: /code/salad

so you consider the release as the digest. I looked at other entries of the registry but I could not find any example of same tag with different digests (ex. build for different architecture), how this will look like in the shpc registry entry? Looking at an example on dockerhub: https://hub.docker.com/r/stdevel/joke-api/tags?page=1&ordering=last_updated It seems those are managed as different tag but I don't get how they manage different version of an app compiled for different architecture, will this be always part of the tag? In this case, can't you, for each release, list the files:

singularityhub-singularity-deploy.latest.sif 768 KB
singularityhub-singularity-deploy.pokemon.sif 768 KB
singularityhub-singularity-deploy.salad.sif 768 KB
Source code (zip)
Source code (tar.gz)

and create something like:

0.0.12: digest of singularityhub-singularity-deploy.latest.sif (or just singularityhub-singularity-deploy.sif depending on the name assigned by the workflow)
0.0.12-salad: digest of singularityhub-singularity-deploy.salad.sif
0.0.12-pokemon: digest of singularityhub-singularity-deploy.pokemon.sif

Same for other tags (github releases). This doesn't look nice but I'm not sure how, even in dockerhub, this case is managed. I mean, a tensorflow 2.4 compiled for arm I would name tensorflow:2.4-armv7, so it usually become part of the tag itself, right?

vsoch commented 3 years ago

That's very close! So binoc is tasked with updating the container.yaml files. This means that he parses them, and if a gh:// unique resource identifier is found, knows that the container in question needs to be updated from GitHub. So he then uses the release API to find new releases, and (before) since there would be a known set of tags (e.g., salad) that correspond to a file, he would update the container.yaml file with the new GitHub release for the tag, like:

salad: 0.0.12
pokemon: 0.0.12

I looked at other entries of the registry but I could not find any example of same tag with different digests (ex. build for different architecture), how this will look like in the shpc registry entry?

We don't yet have a way to represent the "same" tag for a different architecture (meaning a different digest). I suspect we could have some way to represent the architecture in the tag (e.g., -archX) but that would be hard to distinguish from a tag. The only way to currently do it would be to select an entire tag that represents a different architecture. The example you linked with the same tag having different architectures varying by digest could not easily work. The way binoc works, it pings the Docker Hub API, so likely it gets back the architecture for the system it is being run on.

0.0.12: digest of singularityhub-singularity-deploy.latest.sif (or just singularityhub-singularity-deploy.sif depending on the name assigned by the workflow) 0.0.12-salad: digest of singularityhub-singularity-deploy.salad.sif 0.0.12-pokemon: digest of singularityhub-singularity-deploy.pokemon.sif

Oh that's an interesting idea - so perhaps the build could save the images based on the digest and tag, e.g.,:

# Singularity.salad
salad-sha256:<digest>.sif

and then binoc could discover them via the API, and binoc would need to map the salad-.sif binary to be:

<github-release>-<tag>:<digest>
0.0.12-salad:sha256:<digest>

@alecbcs would this be possible (does the API reveal the artifacts in the release?) @fenz if we can programatically get the file names, this definitely would be an approach that could work! @alecbcs as soon as you confirm this would be possible, I can work on refactoring the repo here to build images with this structure, and then we can test running binoc with an update,

alecbcs commented 3 years ago

Hey all! Apologies for the stupidly long delay getting back to this thread! It's finals season so my normal work schedule has gotten completely thrown off. @vsoch at the moment Binoc doesn't have a simple way to query and return what release assets are in a particular release. (Although I'm sure that's something we could work on adding if necessary.) Right now it would be pretty easy to test if a particular releases asset exists or not if we have a programmatic name for it.

Actually the more I look at the Singularity Deploy Spec the more I like the current system. I see why @vsoch went the route she did making the digests the GitHub release versions and the tags a file name extension in each release asset. It's different then what someone familiar with DockerHub might expect, but I'm starting to think it makes more sense for the GitHub release format. I guess it depends on the use case, but if people writing scientific software are likely to just add this deploy method to their existing system and they want to release two tagged versions simultaneously, it makes more sense for them to just add two files to a single release rather than having to make two separate releases for the two tags.

Moreover for people expecting their tags to stay constant over multiple releases this existing system makes more sense for that as well by just updating the digest with the new version rather than having to update both the tag and the digest on each new release.

fenz commented 3 years ago

I agree it makes sense but I feel it is confusing if looking at it not only from a DockerHub but also from a GitHub user perspective since, in GitHub, a release is usually representing a new version of a software.

Moreover for people expecting their tags to stay constant over multiple releases this existing system makes more sense for that as well by just updating the digest with the new version rather than having to update both the tag and the digest on each new release.

I understand at the moment you will need to update the "digest" for a "tag" with a new release but I don't get the "updating both at each release" concern. When you need to update a "container tag" you have to change both tag and digest (actually not even just update but add a new tag/digest "entry") so I feel that's not really changing. Anyway, I just wanted to clarify this sentence but I understand the motivations behind the choice.

The more concerning aspect is that it seems every "tag" gets rebuilt at each release. If you have multiple container tags, meaning different Singularity recipe files, after a while it gets heavy to rebuild all the versions each time you have a new release. Is this the expected behavior (since you want a new "digest" for all of the container tags)? Or it will be possible to add to a release only files which changed?

vsoch commented 3 years ago

I understand at the moment you will need to update the "digest" for a "tag" with a new release but I don't get the "updating both at each release" concern.

@fenz we discussed above that the tag would need to be something like pokemon-0.0.12 to ensure that, for example, both releases 0.0.12 and 0.0.13 could be available. Since 0.0.12 is the GitHub release and pokemon is the tag, you would need to update both at each release.

The more concerning aspect is that it seems every "tag" gets rebuilt at each release. If you have multiple container tags, meaning different Singularity recipe files, after a while it gets heavy to rebuild all the versions each time you have a new release. Is this the expected behavior (since you want a new "digest" for all of the container tags)? Or it will be possible to add to a release only files which changed?

I would actually expect that if someone has a repository with singularity recipes, they would be likely to have the same tags over time. If this isn't the case then the container.yaml would be updated here so a particular tag stops at a certain GitHub release and the new tag is replaced. Binoc can't actually read the release files so this would be up to the maintainer /user to update.

As for if it's possible to draft a new release based on files changed, you could easily update the list commit to go through only the changed Singularity files per git history. That's what shpc does here https://github.com/singularityhub/singularity-hpc/blob/e8366358a818b4dc1e9f154d77542beb8eb7f022/.github/workflows/test.yml#L46 to find new containers to test.

fenz commented 3 years ago

Thanks for the info. As I said at the beginning, I just have to understand how to manage the container versions. So far I thought a good way would be to organize the different files in folders in a way that it is more easy to navigate among versions. I'll try to modify the workflow to:

  1. have a recursive scanning like it was for singularity-hub (it seems this: https://github.com/singularityhub/singularity-deploy/blob/main/.github/workflows/builder.yml#L52 is only scanning the first level)
  2. use the code you shared to filter only Singularity recipes changed/added since last release.

Thanks for your support.

vsoch commented 3 years ago

@fenz that's a good idea - I would update the ls Singularity command to use git to look for changed Singularity files in any directory, and then build those. The template here is only a simple example intended to be edited.

Sure thing! And I'm hoping that GitHub packages supporting arbitrary binaries will be supported soon - then we will have a really nice (supported) way to store sif binaries.