singularityhub / sregistry

server for storage and management of singularity images
https://singularityhub.github.io/sregistry
Mozilla Public License 2.0
103 stars 42 forks source link

Support for _multipart upload (change in scs-library-client) #282

Closed lmcdasm closed 4 years ago

lmcdasm commented 4 years ago

The original bug below describes the result when a version of the scs-library-client is installed that is too new - meaning it requests an endpoint that does not exist.

POST RequestPushImageFileView
Internal Server Error: /v2/imagefile/5/_multipart

It looks like this was added less than a month ago. Singularity Registry server has no understanding of this request, so it goes to the wrong view, and thus the argument is parsed incorrectly (it's hitting the RequestPushFileView which (before they updated their library client) only expected the id of an image.

    url(
        r"^v2/imagefile/(?P<container_id>.+?)/?$",
        views.RequestPushImageFileView.as_view(),
    ),  # return push url

But instead the client is now providing a string with multipart, hence the error you see above. The fix to this issue would be to add the endpoint. It looks like it we provide a 404 (not found) it will resort to the old functionality. https://github.com/sylabs/scs-library-client/blob/30f9b6086f9764e0132935bcdb363cc872ac639d/client/push.go#L274


Describe the bug When pushing a built SIF, the command returns with a error code 500 (from nginx), howver the sregistry conatiner (sregistry_uwsgi_1) throws a python error.

Tracing the the stack is seems a schema/table issue in django actually. see below To Reproduce singularity push -U --library http:// trax_centos.sif library://daniel_smith/test/trax_centos:latest

< i also tried it the "remote use" method> - same result [daniel.smith@ip-0AB36F04 ~]$ singularity push -U trax_centos.sif

OUTPUT RECEIVED: 0 B / 1.20 GiB [----------------------------------------------------------------------] 0.00% 0s FATAL: Unable to push image to library: request did not succeed: http status code: 500 [daniel.smith@ip-0AB36F04 ~]$

Running the LOGS on the Docker Container - sregistry_uwsgi_1 - i see this trace - the last line of the trace indicating invalid literal for int() with base 10

Check the DB docker container and no errors thrown there. I am able to create Collections and Teams via the UI.

pid: 43|app: 0|req: 44/172] 10.179.111.4 () {32 vars in 670 bytes} [Sun Mar 8 15:13:18 2020] GET /v1/images/daniel_smith/test/trax_centos:sha256.40246353c0a1585594b5db13eaf33beb038dee7cf83cb4d1ed9c22c6ba9eaced?arch=amd64 => generated 556 bytes in 35 msecs (HTTP/1.1 200) 5 headers in 142 bytes (1 switches on core 2) [pid: 42|app: 0|req: 58/173] 10.179.111.4 () {32 vars in 441 bytes} [Sun Mar 8 15:13:18 2020] GET /version => generated 58 bytes in 1 msecs (HTTP/1.1 200) 5 headers in 141 bytes (1 switches on core 2) POST RequestPushImageFileView Internal Server Error: /v2/imagefile/5/_multipart Traceback (most recent call last): File "/usr/local/lib/python3.5/site-packages/django/core/handlers/exception.py", line 34, in inner response = get_response(request) File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 115, in _get_response response = self.process_exception_by_middleware(e, request) File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 113, in _get_response response = wrapped_callback(request, *callback_args, callback_kwargs) File "/usr/local/lib/python3.5/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view return view_func(*args, *kwargs) File "/usr/local/lib/python3.5/site-packages/django/views/generic/base.py", line 71, in view return self.dispatch(request, args, kwargs) File "/usr/local/lib/python3.5/site-packages/ratelimit/mixins.py", line 58, in dispatch )(super(RatelimitMixin, self).dispatch)(*args, kwargs) File "/usr/local/lib/python3.5/site-packages/ratelimit/decorators.py", line 30, in _wrapped return fn(*args, *kw) File "/usr/local/lib/python3.5/site-packages/rest_framework/views.py", line 505, in dispatch response = self.handle_exception(exc) File "/usr/local/lib/python3.5/site-packages/rest_framework/views.py", line 465, in handle_exception self.raise_uncaught_exception(exc) File "/usr/local/lib/python3.5/site-packages/rest_framework/views.py", line 476, in raise_uncaught_exception raise exc File "/usr/local/lib/python3.5/site-packages/rest_framework/views.py", line 502, in dispatch response = handler(request, args, kwargs) File "./shub/apps/library/views/images.py", line 96, in post container = Container.objects.get(id=container_id) File "/usr/local/lib/python3.5/site-packages/django/db/models/manager.py", line 82, in manager_method return getattr(self.get_queryset(), name)(*args, kwargs) File "/usr/local/lib/python3.5/site-packages/django/db/models/query.py", line 399, in get clone = self.filter(*args, *kwargs) File "/usr/local/lib/python3.5/site-packages/django/db/models/query.py", line 892, in filter return self._filter_or_exclude(False, args, kwargs) File "/usr/local/lib/python3.5/site-packages/django/db/models/query.py", line 910, in _filter_or_exclude clone.query.add_q(Q(*args, kwargs)) File "/usr/local/lib/python3.5/site-packages/django/db/models/sql/query.py", line 1290, in addq clause, = self._add_q(q_object, self.used_aliases) File "/usr/local/lib/python3.5/site-packages/django/db/models/sql/query.py", line 1318, in _add_q split_subq=split_subq, simple_col=simple_col, File "/usr/local/lib/python3.5/site-packages/django/db/models/sql/query.py", line 1251, in build_filter condition = self.build_lookup(lookups, col, value) File "/usr/local/lib/python3.5/site-packages/django/db/models/sql/query.py", line 1116, in build_lookup lookup = lookup_class(lhs, rhs) File "/usr/local/lib/python3.5/site-packages/django/db/models/lookups.py", line 20, in init self.rhs = self.get_prep_lookup() File "/usr/local/lib/python3.5/site-packages/django/db/models/lookups.py", line 70, in get_prep_lookup return self.lhs.output_field.get_prep_value(self.rhs) File "/usr/local/lib/python3.5/site-packages/django/db/models/fields/init.py", line 972, in get_prep_value return int(value) ValueError: invalid literal for int() with base 10: '5/_multipart'** [pid: 42|app: 0|req: 59/174] 10.179.111.4 () {34 vars in 505 byte

Expected behavior Container is pushed up to the Sregistry correctly.

If applicable, add versions and screenshots to help explain your problem.

vsoch commented 4 years ago

What version of Singularity are you using?

vsoch commented 4 years ago

What sticks out to me is that your client is requesting an endpoint that ends in _multipart.

POST RequestPushImageFileView
Internal Server Error: /v2/imagefile/5/_multipart

It looks like this was added less than a month ago. Singularity Registry server has no understanding of this request, so it goes to the wrong view, and thus the argument is parsed incorrectly (it's hitting the RequestPushFileView which (before they updated their library client) only expected the id of an image.

    url(
        r"^v2/imagefile/(?P<container_id>.+?)/?$",
        views.RequestPushImageFileView.as_view(),
    ),  # return push url

But instead the client is now providing a string with multipart, hence the error you see above. So you need to use a less bleeding edge version of the scs-library-client, and this should resolve.

vsoch commented 4 years ago

And we should probably add to the documentation about the upper limit of Singularity version that works (I do remember it being okay with 3.5.x, although if the library client isn't capped based on the Singularity version, it could be that new installs will just grab latest and result in this error. Yuck.

lmcdasm commented 4 years ago

awesome ( I was about to add to the ticket:

Singularity 3.5.3 golang 1.14 Sregistry docker pkg = 1.1.18

will lower my singularity client and let you know.. woul you say that 3.3.0 is what we should be targeting?

thanks alot! Daniel

On Sun, 8 Mar 2020 at 16:40, Vanessasaurus notifications@github.com wrote:

What sticks out to me is that your client is requesting an endpoint that ends in _multipart.

POST RequestPushImageFileView Internal Server Error: /v2/imagefile/5/_multipart

It looks like this was added less than a month ago https://github.com/sylabs/scs-library-client/pull/70. Singularity Registry server has no understanding of this request, so it goes to the wrong view, and thus the argument is parsed incorrectly (it's hitting the RequestPushFileView which (before they updated their library client) only expected the id of an image.

url(
    r"^v2/imagefile/(?P<container_id>.+?)/?$",
    views.RequestPushImageFileView.as_view(),
),  # return push url

But instead the client is now providing a string with multipart, hence the error you see above. So you need to use a less bleeding edge version of the scs-library-client, and this should resolve.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/singularityhub/sregistry/issues/282?email_source=notifications&email_token=ACHMXRFWNTNNERQLQLPWULTRGQGGLA5CNFSM4LD4T6X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOFA3YY#issuecomment-596250083, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHMXRAZKWBL74IUDSLJPY3RGQGGLANCNFSM4LD4T6XQ .

vsoch commented 4 years ago

I have tested a few times with 3.2.1, and (I think) I've also used either 3.5.0 or 3.5.1 and it worked okay. The current (latest) release of sregistry (1.1.21) is what I would recommend, we just had a bug fix with pushing images. It looks like you are looking on Docker Hub - the containers referenced in the docker-compose should be the ones on quay.io https://quay.io/repository/vanessa/sregistry.

vsoch commented 4 years ago

Once you've tried those and confirmed, I will update the docs here to set the upper limit (for now).

vsoch commented 4 years ago

@lmcdasm I think I know a quick fix that would address this - if you would be willing to test a PR, I think you can continue using the version of Singularity that you already have. Give me a few minutes to work on a PR.

lmcdasm commented 4 years ago

I have tested a few times with 3.2.1, and (I think) I've also used either 3.5.0 or 3.5.1 and it worked okay. The current (latest) release of sregistry (1.1.21) is what I would recommend, we just had a bug fix with pushing images. It looks like you are looking on Docker Hub - the containers referenced in the docker-compose should be the ones on quay.io https://quay.io/repository/vanessa/sregistry.

docker-compose snippets - im pretty sure im using the quay ones.

uwsgi: restart: always image: quay.io/vanessa/sregistry volumes: .. scheduler: image: quay.io/vanessa/sregistry command: python /code/manage.py rqscheduler volumes: worker: image: quay.io/vanessa/sregistry

lmcdasm commented 4 years ago

Hey there..

ok - which would you like.

I was about to bump our registry up to 1.1.21 as outlined and do a rebuild . have a client here with 3.2.1 can try with

happy to take a PR sure. let me know

vsoch commented 4 years ago

I think we would help many more others in the future by testing adding support (a 404 returned by the endpoint) so let's try the PR first.

lmcdasm commented 4 years ago

perfect.. will sit tight.

vsoch commented 4 years ago

Here you go! https://github.com/singularityhub/sregistry/pull/283 thank you kindly for testing it out, fingers crossed the scs-library-client works as I expect it to based on looking at that function.

lmcdasm commented 4 years ago

thanks - just so im doing it right (to test). here is the steps i did (from sregistry local gitlab directory) git pull origin add/multipart-upload-404 put back my docker-compose and shub config.py (im setup for SSL/HTTPS) performing a build now (with my flags = had LDAP support in place - wanted to ask about a Azure AD plugin actually at some point :) )

will start up the stack, create a simple collection and team and try uploading. (docker-compose up should actually just see the uwsgi, scheduler and worker to be recreated)

Correct or missing something?

vsoch commented 4 years ago

I don't know what "sregistry gitlab directory is" - we're on GitHub, did you mean GitHub? It sounds like you are testing with your production server - and if you haven't done anything with it yet, that would be okay (the assumption is that you are okay with everything exploding and starting over). If that's not okay, then you'll need to set up a different server.

So - you didn't need to rebuild, but if it's already done, no worries. Generally rebuild only corresponds with changes in dependences (pip install, etc.) and not just changes to code. The code is bound to /code in the container, so a restart is sufficient for an update.

lmcdasm commented 4 years ago

hey there.

not testing on a prod server, so can hack away. by local gitlab repo, i just meant that it where i pulled your stuff from github (git pull) so that i just pulled the new branch, no sweat..

A very interesting thing occurred:

the upload started to Work - but it then threw 504.. this is after the progress and such tried to load it (delay) - but that was cause the collection in my path was no there.

Client View:

[daniel.smith@ip-0AB36F04 ~]$ singularity push -U trax_centos.sif library://daniel_smith/test/trax_centos:0.0.1 WARNING: Skipping container verifying 1.20 GiB / 1.20 GiB [============================================================================================] 100.00% 10.98 MiB/s 1m52s FATAL: Unable to push image to library: error uploading image: HTTP status 504 [daniel.smith@ip-0AB36F04 ~]$ singularity push -U trax_centos.sif library://daniel_smith/test/trax_centos:0.0.1 WARNING: Skipping container verifying ^C

Setup the container path and then got a 405 [daniel.smith@ip-0AB36F04 ~]$ singularity push -U trax_centos.sif library://daniel_smith/test/trax_centos:0.0.1 WARNING: Skipping container verifying FATAL: Unable to push image to library: request did not succeed: http status code: 405 [daniel.smith@ip-0AB36F04 ~]$

lmcdasm commented 4 years ago

so i think there is a login issue in getting back to scratch. - have to fetch new API key.. wont be long.

vsoch commented 4 years ago

If the containers were re-created then you would need to login again and update the token. The easiest thing to do is just update the file, should be at $HOME/.singularity/remote.yaml. But if it were a login issue I think we'd see that endpoint fail, no? I'm worried that they have other changes in their API flow that are different. It's all totally undocumented, so fairly challenging to figure out or just keep up generally.

lmcdasm commented 4 years ago

fixed up the API key and logged back into the remote - im getting that same as the first time.

Client View: daniel.smith@ip-0AB36F04 ~]$ singularity push -U trax_centos.sif library://daniel_smith/test/trax_centos:0.0.1 WARNING: Skipping container verifying 1.20 GiB / 1.20 GiB [==================================================================================================] 100.00% 23.52 MiB/ FATAL: Unable to push image to library: error uploading image: HTTP status 504

SERVER LOGS HTTP/1.1 200) 5 headers in 142 bytes (1 switches on core 3) GET PushNamedContainerView [pid: 43|app: 0|req: 18/44] 10.179.111.4 () {32 vars in 670 bytes} [Sun Mar 8 16:33:40 2020] GET /v1/images/daniel_smith/test/trax_centos:sha256.40246353c0a1585594b5db13eaf33beb038dee7cf83cb4d1ed9c22c6ba9eaced?arch=amd64 => generated 556 bytes in 42 msecs (HTTP/1.1 200) 5 headers in 142 bytes (1 switches on core 0) [pid: 43|app: 0|req: 19/45] 10.179.111.4 () {32 vars in 441 bytes} [Sun Mar 8 16:33:40 2020] GET /version => generated 58 bytes in 1 msecs (HTTP/1.1 200) 5 headers in 141 bytes (1 switches on core 2) POST RequestMultiPartPushImageFileView Not Found: /v2/imagefile/1/_multipart [pid: 37|app: 0|req: 9/46] 10.179.111.4 () {34 vars in 505 bytes} [Sun Mar 8 16:33:40 2020] POST /v2/imagefile/1/_multipart => generated 0 bytes in 2 msecs (HTTP/1.1 404) 4 headers in 110 bytes (1 switches on core 1) POST RequestPushImageFileView [pid: 43|app: 0|req: 20/47] 10.179.111.4 () {34 vars in 485 bytes} [Sun Mar 8 16:33:40 2020] POST /v2/imagefile/1 => generated 121 bytes in 18 msecs (HTTP/1.1 200) 5 headers in 137 bytes (1 switches on core 1) PUT PushImageFileView Filename for parser 792a00ca-449f-43c4-a70d-2a4c1b572602.sif

<wait until the 1.2GB file is uploaded> and nothing else is shown (docker -f on the uwsgi container).

vsoch commented 4 years ago

okay, so we've made progress! Take a look at this section:

POST RequestMultiPartPushImageFileView
Not Found: /v2/imagefile/1/_multipart
[pid: 37|app: 0|req: 9/46] 10.179.111.4 () {34 vars in 505 bytes} [Sun Mar 8 16:33:40 2020] POST /v2/imagefile/1/_multipart => generated 0 bytes in 2 msecs (HTTP/1.1 404) 4 headers in 110 bytes (1 switches on core 1)
POST RequestPushImageFileView

Instead of posting to the wrong view, the endpoint returns 404, and it retries with the v2PostView (what originally worked). It looks like it's hanging on:

PUT PushImageFileView
Filename for parser 792a00ca-449f-43c4-a70d-2a4c1b572602.sif

Can you please try a smaller image to test, maybe busybox?

singularity pull docker://busybox
lmcdasm commented 4 years ago

sure.. one sec

lmcdasm commented 4 years ago

no issues with the smaller busybox issue.

[daniel.smith@ip-0AB36F04 ~]$ singularity push -U busybox_latest.sif library://daniel_smith/test/busybox:0.0.1 WARNING: Skipping container verifying 764.00 KiB / 764.00 KiB [============================================================================================] 100.00% 6.91 MiB/s 0s [daniel.smith@ip-0AB36F04 ~]$

vsoch commented 4 years ago

Okay, so I think it's likely just running through this section to write the file (via chunks) from the object. If you look in the root "images" folder (bound to /var/www/images) you should find a folder for your collection, and use ls -l to see if there is an image file there that is changing in size (being written).

lmcdasm commented 4 years ago

So the file is in fact there

[dasm@singulatiry-reg-00 test]$ ls -al total 1260376 782336 Mar 8 21:40 busybox-sha256.e1e42dd09862d094487cf8fd3f4c93b6ab2d6245268aa75c758438107d2fe4a8.sif 1289842688 Mar 8 21:36 trax_centos-sha256.40246353c0a1585594b5db13eaf33beb038dee7cf83cb4d1ed9c22c6ba9eaced.sif [dasm@singulatiry-reg-00 test]$

lmcdasm commented 4 years ago

image

vsoch commented 4 years ago

Right, I mean to observe if the file is writing to the system for the image that is reporting an error.

vsoch commented 4 years ago

So - you might be having some issue with the size of images, but what we are currently testing - that your previously working client did not work because of multi-upload, that is now working correctly, as you pushed the busybox image? I'd like to keep these two things separate - if the mutlipart endpoint error is now resolved, then please report this in #283 so we can merge the fix and close the issue here, and if there is further issue with a large image please open a new issue for it. Thanks!

lmcdasm commented 4 years ago

here is the thing> so the Collection show the container as there, but the container is not actually listed when you look in the collection.

While doing a transfer of a new "version" (0.0.2) - i can see that the collections isnt being written to while its transferring..

however, the _upload directory is written to and then the file seems to be copied over. so seems like we have the file in the images directory, but it doesnt seem like its 'defined"

Agreed.. if you want to split up the issue, since now i think its a question of the response and the file showing up in the registry im good .:)

vsoch commented 4 years ago

For next steps please:

Can you confirm that the busybox container is uploaded and is part of the collection, or you are saying that it is not? I'm not clear if you are referring to the busybox container as "the file" or the large one that had a clear error with it's upload.

lmcdasm commented 4 years ago

Apologies.. let me be more clear and will write up a new issue.

i think i 'reviewed" the PR - i added a comment, if i need to click/something else let me know.

GOOD CASE (busybox) singularity push works fine. image is seen in images file system (ex. 82336 Mar 8 21:40 busybox-sha256.e1e42dd09862d094487cf8fd3f4c93b6ab2d6245268aa75c758438107d2fe4a8.sif ) when you look in the Web UI, you can see in the collections that the number of containers is updated. you can see the container listed when you look in the Collections as seen here.

image

BAD CASE (1.2GB SIF) singularity push - returns 504 filesystem shows that the file is there (md5sum from source and this filesystem match) 1289842688 Mar 8 21:47 trax_centos_2-sha256.40246353c0a1585594b5db13eaf33beb038dee7cf83cb4d1ed9c22c6ba9eaced.sif

However, the container is not listed in the same picture above (trax_centos is in the same collection - test- as in the picture but they dont seem to show up).

Cheers

vsoch commented 4 years ago

okay cool - so lets move forward to merge the multipart endpoint, because it looks like it will work for smaller containers. My concern is that there are other changes to their client that are leading to the timeout - previously I tested with much larger containers (8GB) and didn't have an issue. For your next steps, sometime this week could you give me the exact steps that you used to:

Usually this kind of detailed testing takes me a few hours and it's incredibly detailed work looking through several repos of sylabs, so please set your expectation accordingly.

lmcdasm commented 4 years ago

No problems i will fill in the bullets that you outline.. do you want it here or another ticket?

Here are the server logs now (i cut out the pid parts to make it a bit easier to read) GET NamedEntityView <QueryDict: {}> daniel_smith GET GetNamedCollectionView GET GetNamedContainerView GET PushNamedContainerView GET /v1/images/daniel_smith/test/trax_centos_2:sha256.40246353c0a1585594b5db13eaf33beb038dee7cf83cb4d1ed9c22c6ba9eaced?arch=amd64 => generated 565 bytes in 36 msecs (HTTP/1.1 200) 5 headers in 142 bytes (1 switches on core 3) GET /version => generated 58 bytes in 1 msecs (HTTP/1.1 200) 5 headers in 141 bytes (1 switches on core 1) POST RequestMultiPartPushImageFileView Not Found: /v2/imagefile/3/_multipart POST /v2/imagefile/3/_multipart => generated 0 bytes in 1 msecs (HTTP/1.1 404) 4 headers in 110 bytes (1 switches on core 2) POST RequestPushImageFileView POST /v2/imagefile/3 => generated 121 bytes in 22 msecs (HTTP/1.1 200) 5 headers in 137 bytes (1 switches on core 0) PUT PushImageFileView Filename for parser 97e0a681-514e-413e-9ffa-076a667fcfbf.sif PUT /v2/push/imagefile/3/294608d4-12a6-45e6-a202-14aa2acca1f1 => generated 0 bytes in 238613 msecs (HTTP/1.1 200) 4 headers in 102 bytes (20 switches on core 0)

Will give you the details for your bullets shortly

vsoch commented 4 years ago

Please put all of the above that are relevant for this second issue in a new issue, this issue is closed as addressed by #283.