nomad-coe / nomad

NOMAD lets you manage and share your materials science data in a way that makes it truly useful to you, your group, and the community.
https://nomad-lab.eu
Apache License 2.0
64 stars 14 forks source link

Gateway Time-out (504) when editing metadata of many entries simultaneously #18

Closed ondracka closed 1 year ago

ondracka commented 3 years ago

Unexpected error: "Gateway Time-out (504)". Please try again and let us know, if this error keeps happening.

This happens when I try to edit metadata (comment, dataset, etc) of multiple 1k+ entries together. Surprisingly, if I later refresh the uploads page and select some random entries they seem to have the correct metadata so this could be harmless (however I did not check all of the entries so it is possible some are not updated).

Nothing interesting shows in docker logs of either worker, app or gui container.

This is with Oasis running latest v0.10.4 branch

markus1978 commented 3 years ago

There is an underlying flaw in the editing system that we are aware of and will resolve at some point. Editing large amounts of entries takes awhile; yet we attempt to do this during the API call. The "editing" itself is quick (just one action on mongodb). The majority of the time is spend updating elasticsearch. These updates are send to elasticsearch in bulk and it is possible (but not guaranteed) that all necessary updates have already been issued before the timeout.

I am not sure what the default nginx timeout is, maybe 60s? You could increase this. Of course this more an band-aid than a fix. You can add something like this to your Oasis nginx.conf:

location ~ \/api\/repo\/edit {
      proxy_buffering off;
      proxy_read_timeout 600;
      proxy_pass http://app:8000;
}

These 10 minutes should allow to edit up-to 10k and probably more.

If you get this timeout, the actual editing of the data was probably successful, but the re-indexing in elasticsearch might be partial. If you think this is the case, you can manually re-index all data of an upload from the CLI with the nomad admin uploads re-index command after a "failed" editing attempt.

In the future, we want to handle this as asynchronous tasks (similar to processing). Here, the edit request from the UI would merely trigger the updated which is then performed in the background. Another solution would be to update elasticsearch entries by query (just a sigle operation on elasticsearch) instead of reindexing (basically sending all entries to elasticsearch batch by batch). Either of those fixes will be implemented, but only at due time. Probably not before this fall.

markus1978 commented 1 year ago

With the improved new version of the archive API this is fixed.