singularityhub / sregistry-cli

Singularity Global Client for container management
https://singularityhub.github.io/sregistry-cli/
Mozilla Public License 2.0
14 stars 18 forks source link

sregistry add overwrites image but keeps database entry #214

Closed tschoonj closed 5 years ago

tschoonj commented 5 years ago

Describe the bug

Hi @vsoch

Just noticed that adding a locally built image to the registry overwrites the image file but keeps the database entry, leading to something like:

> sregistry images
Containers:   [date]   [client] [uri]
1  May 31, 2019    [s3] centos6/conda-build:latest@a2ae8367e5bc2a9af0574a3b62a886dd
2  June 04, 2019       [s3] savu/deps:latest@1515f70906a2422ac7912986e0ce8f00
3  June 04, 2019       [s3] centos7/conda-build:latest@da7906cc821eaa1b043a0a219ef7dffd
4  June 04, 2019       [s3] savu/deps:latest@c4b83ec7f91bf730c9eefc4b6a3f90d1
5  June 04, 2019       [s3] savu/deps:latest@004fd16e55856bd9497247517d5c161e
6  June 04, 2019       [s3] savu/deps:latest@80fe53c443166af2bc677d49a3aad566
7  June 04, 2019       [s3] savu/deps:latest@1dd4ff75935975d136a3978f1bbebc3f

Note that only the most recently added version works with get:

> sregistry get savu/deps:latest@1dd4ff75935975d136a3978f1bbebc3f
/dls/tmp/awf63395/singularity/shub/savu-deps-latest.sif

> sregistry get latest@80fe53c443166af2bc677d49a3aad566savu/deps:latest@80fe53c443166af2bc677d49a3aad566
ERROR image latest@80fe53c443166af2bc677d49a3aad566savu/deps:latest@80fe53c443166af2bc677d49a3aad566 not found in database

To Reproduce Steps to reproduce the behavior:

Run the following a couple of times:

> sregistry add --name my-library/my-image my-image.simg

Expected behavior

I would expect that the entry gets overwritten when adding a previously added image with the same name.

Version of Singularity and Singularity Registry Client

> singularity --version
2.6.1-dist

> python -c "import sregistry ; print(sregistry.__version__)"
0.2.15

By the way: perhaps sregistry should get itself a --version option?

vsoch commented 5 years ago

Great idea on the --version option! I think this looks like a bug - what is happening is that when we add an image, we add the version completely. Can you give me a complete sequence of additions / commands to do to reproduce what you've shown above? We can go from there.

tschoonj commented 5 years ago

I think that all you need to do is execute the following two commands a number of times:

sudo singularity build my-image.simg Singularity
sregistry add --name my-library/my-image my-image.simg

Every time after these commands have been executed, there should be a new result when running sregistry images.

vsoch commented 5 years ago

okay, starting with no images, and the sregistry version:

$ sregistry images
sregistry version
0.2.15

I'm just going to pull a busybox image, since I don't have a recipe handy.

$ singularity pull docker://busybox
INFO:    Starting build...
Getting image source signatures
Skipping fetch of repeat blob sha256:53071b97a88426d4db86d0e8436ac5c869124d2c414caf4c9e4a4e48769c7f37
Copying config sha256:bbb686565dce6705d231f98234530d1684e0321730b736d3cc06d79d9cfed228
 574 B / 574 B [============================================================] 0s
Writing manifest to image destination
Storing signatures
INFO:    Creating SIF file...
INFO:    Build complete: busybox_latest.sif

Now I'll add it once...

$ sregistry add --name my-library/my-image busybox_latest.sif 
...
$ sregistry images
Containers:   [date]   [client] [uri]
1  June 04, 2019       [hub]    my-library/my-image:latest@407e58c8295f495498df03cbf479f0de

And pull and add again:

$ singularity pull docker://busybox 
$ sregistry add --name my-library/my-image busybox_latest.sif 
...
$ sregistry images
Containers:   [date]   [client] [uri]
1  June 04, 2019       [hub]    my-library/my-image:latest@407e58c8295f495498df03cbf479f0de
2  June 04, 2019       [hub]    my-library/my-image:latest@7a7522e617d3e010a00c10d4016d4a0e

I can't reproduce what you are getting. My only thought is that the hash is somehow the same on the image (e.g., you are trying to add the same image) and thus it doesn't add it twice.

vsoch commented 5 years ago

It also doesn't make sense that you are asking to get based on a tag?

sregistry get latest@80fe53c443166af2bc677d49a3aad566
tschoonj commented 5 years ago

Yep this is pretty much what I also saw: one image on the filesystem, two entries in the database, with identical library, image and tag names.

I would have expected a single entry in the database, with the oldest one being removed when the new one gets added since they have identical names.

tschoonj commented 5 years ago

It also doesn't make sense that you are asking to get based on a tag?

sregistry get latest@80fe53c443166af2bc677d49a3aad566

That was me being more silly than usual. Sorry about that.

vsoch commented 5 years ago

I highly support being silly, people are too serious these days :)

Good call - I only have one image too! It also looks like folders are being created for collections (but then the image files just saved under .singularity/shub so I'll look into that too.

I have some really cool updates for singularity registry server / sregistry client to go along with it coming up! I hope I can finish up in the next few weeks and will have something to share soon :)

vsoch commented 5 years ago

I think what might be happening is that we copy the file, then take the hash, so we get the same file (but different hashes due to timestamp differences). If it's a move, we shouldn't technically do that. Looking into it now.

tschoonj commented 5 years ago

In my case the images are actually different, as they are produced with singularity build, based on (slightly) different recipes, but always with the same name.

vsoch commented 5 years ago

Ah that's it then, see the function to get the hash can take account for having different files:

def get_image_hash(image_path):
    '''return an md5 hash of the file based on a criteria level. This
       is intended to give the file a reasonable version.

       Parameters
       ==========
       image_path: full path to the singularity image

    '''
    hasher = hashlib.md5()
    with open(image_path, "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            hasher.update(chunk)
    return hasher.hexdigest()

And that is what we would want - they are different! The bug is that the image is stored based on the name (and not the hash). How do you think is best to proceed? We can either store images in folders (or similar) based on the collection/name and then hash, or something else?

vsoch commented 5 years ago

We probably should use a better hash function too :)

vsoch commented 5 years ago

Actually - there was a bug. The version from the file hash wasn't being honored. I think the fix will work, give me a few minutes.

tschoonj commented 5 years ago

IMHO, if you're adding an image that already exists in the database with same library, image and tag name, one should overwrite the database entry and the image file. In this case you may want to ask for a confirmation though (overridable through a --force option).

vsoch commented 5 years ago

So don't respect different versions? Why bother keeping versions in the first place then? I have a PR that will allow the user to add different versions, and when the user asks for a tag it will return the first result returned (latest).

tschoonj commented 5 years ago

If I wanted to keep multiple versions of an image in the registry, I would use tags to differentiate between them. Does that make sense?

vsoch commented 5 years ago

Yes definitely, but others might want to be able to add images and keep previous versions. Given that the current implementation still allows you to sregistry get without a version, I'm thinking it would be best to leave as is because it somewhat meets both use cases. As soon as I remove versions (or just have them replaced with a current) someone is likely to get upset.