riesgos / buildall

Repository for building and running all of riesgos' services
Apache License 2.0
0 stars 1 forks source link

Riesgos-WPS returns 404 sometimes #12

Closed MichaelLangbein closed 1 year ago

MichaelLangbein commented 1 year ago

Still trying to reproduce when exactly this happens.

What I've observed so far:

This might have a lot to do with my development-machine, though ... still investigating.

UPDATE: I've deactivated the monitor for now. I found that no errors get thrown anymore ~10 minutes after the containers are up, but this only seems to work after having re-started the wps once.

MichaelLangbein commented 1 year ago

Last entries of wps-init log:

* Connected to riesgos-wps (172.28.0.3) port 8080 (#0)
* Server auth using Basic with user 'admin'
> PUT /geoserver/rest/styles/style-damagestate-sara-plasma HTTP/1.1
> Host: riesgos-wps:8080
> Authorization: Basic YWRtaW46Z2Vvc2VydmVy
> User-Agent: curl/8.1.1
> Accept: */*
> Content-type: application/vnd.ogc.sld+xml
> Content-Length: 6752
> 
} [6752 bytes data]
< HTTP/1.1 200 
< Date: Thu, 25 May 2023 14:07:47 GMT
< Server: Noelios-Restlet-Engine/1.0..8
< Transfer-Encoding: chunked
< 
{ [5 bytes data]
100  6752    0     0  100  6752      0   757k --:--:-- --:--:-- --:--:--  824k
* Connection #0 to host riesgos-wps left intact
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.28.0.3:8080...
* Connected to riesgos-wps (172.28.0.3) port 8080 (#0)
* Server auth using Basic with user 'admin'
> DELETE /geoserver/rest/styles/style-damagestate-suppasri-plasma HTTP/1.1
> Host: riesgos-wps:8080
> Authorization: Basic YWRtaW46Z2Vvc2VydmVy
> User-Agent: curl/8.1.1
> Accept: */*
> Content-type: text/xml
> 
< HTTP/1.1 404 
< Content-Type: text/plain
< Transfer-Encoding: chunked
< Date: Thu, 25 May 2023 14:07:47 GMT
< 
{ [54 bytes data]
100    48    0    48    0     0  16460      0 --:--:-- --:--:-- --:--:-- 24000
* Connection #0 to host riesgos-wps left intact
Note: Unnecessary use of -X or --request, POST is already inferred.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.28.0.3:8080...
* Connected to riesgos-wps (172.28.0.3) port 8080 (#0)
* Server auth using Basic with user 'admin'
> POST /geoserver/rest/styles HTTP/1.1
> Host: riesgos-wps:8080
> Authorization: Basic YWRtaW46Z2Vvc2VydmVy
> User-Agent: curl/8.1.1
> Accept: */*
> Content-type: text/xml
> Content-Length: 119
> 
} [119 bytes data]
< HTTP/1.1 201 
< Date: Thu, 25 May 2023 14:07:47 GMT
< Location: http://riesgos-wps:8080/geoserver/rest/styles/style-damagestate-suppasri-plasma
< Server: Noelios-Restlet-Engine/1.0..8
< Transfer-Encoding: chunked
< 
{ [5 bytes data]
100   119    0     0  100   119      0  19343 --:--:-- --:--:-- --:--:-- 19833
* Connection #0 to host riesgos-wps left intact
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.28.0.3:8080...
* Connected to riesgos-wps (172.28.0.3) port 8080 (#0)
* Server auth using Basic with user 'admin'
> PUT /geoserver/rest/styles/style-damagestate-suppasri-plasma HTTP/1.1
> Host: riesgos-wps:8080
> Authorization: Basic YWRtaW46Z2Vvc2VydmVy
> User-Agent: curl/8.1.1
> Accept: */*
> Content-type: application/vnd.ogc.sld+xml
> Content-Length: 6760
> 
} [6760 bytes data]
< HTTP/1.1 200 
< Date: Thu, 25 May 2023 14:07:47 GMT
< Server: Noelios-Restlet-Engine/1.0..8
< Transfer-Encoding: chunked
< 
{ [5 bytes data]
100  6760    0     0  100  6760      0   768k --:--:-- --:--:-- --:--:--  825k
* Connection #0 to host riesgos-wps left intact
No such style: shakemap-pgaNo such style: style-damagestate-saraNo such style: style-damagestate-suppasriNo such style: style-damagestate-medinaNo such style: style-transitionsNo such style: style-lossNo such style: style-cum-lossNo such style: style-cum-loss-chile-plasmaNo such style: style-cum-loss-peru-plasmaNo such style: style-damagestate-medina-plasmaNo such style: style-damagestate-sara-plasmaNo such style: style-damagestate-suppasri-plasma
MichaelLangbein commented 1 year ago

The weird thing is that shakemap-pga.sld is actually present in /styles/. style-damagestate-suppasri also exists (under the name style_damagestate_suppasri.sld, which is written correctly in add-style-to-geoserver.sh)

Also weird is that this step happens after step_tomcat_reload_settings ... so all important tasks should have been completed by wps-init already.

Maybe we need to re-start the tomcat-servlets after boot?

nbrinckm commented 1 year ago

The 404 for the delete can be explained: The script tries to delete existing styles before sending the new value to the server. If it is not found it returns an 404.

The overall logic is fine with the 404 - as it means that there is nothing to delete. The script can go on with adding the styles then. And you also see the 201 response for the post (created) and the 200 for the put (success).

But all those status codes are just the init script - and for urls to the geoserver.

In case we have 404 by the wps, we have to check those.

(It is also intentend that the style transfer is done after the reload of the tomcat settings. I could set them within the file system (as I mounted it in the wps container) but then I'm not able anymore to reuse the existing script to upload the files. So in case I make changes there, an extra handling in the buildall repo here may no longer be up to date.)

nbrinckm commented 1 year ago

What exactly is does your monitor test?

I currently build a little script that calls the getCapabilities request every 10 seconds & I will see if it will throw any kind of error

MichaelLangbein commented 1 year ago

What exactly is does your monitor test?

It tries to execute a full run of all available scenarios once every x minutes. I've deactivated the monitor for now, though. Here's where that test happens: https://github.com/riesgos/dlr-riesgos-frontend/blob/70fcc6762416f8a0c52cadb87c60661e7051e9c4/monitor/src/runall.ts#L46

In case we have 404 by the wps, we have to check those.

Yeah, I'm getting those from the frontend when executing a wps, too.

Here's a reproduction:

 Running 11/11
 ✔ Network buildall_default               Created                                                                                                                                                                             0.0s 
 ✔ Volume "buildall_tomcat-folder"        Created                                                                                                                                                                             0.0s 
 ✔ Volume "buildall_logs"                 Created                                                                                                                                                                             0.0s 
 ✔ Volume "buildall_store"                Created                                                                                                                                                                             0.0s 
 ✔ Container buildall-backend-1           Started                                                                                                                                                                             0.5s 
 ✔ Container buildall-riesgos-wps-1       Started                                                                                                                                                                             0.4s 
 ✔ Container buildall-monitor-1           Started                                                                                                                                                                             1.1s 
 ✔ Container buildall-frontend-1          Started                                                                                                                                                                             1.2s 
 ✔ Container buildall-compare-frontend-1  Started                                                                                                                                                                             0.8s 
 ✔ Container buildall-wps-init-1          Started                                                                                                                                                                             0.9s 
 ✔ Container buildall-reverse_proxy-1     Started                                                                                                                                                                             0.9s 

lang_m13-a@eoc:/localhome/lang_m13/Desktop/code/buildall$ sudo docker container exec -it buildall-backend-1 /bin/bash

root@e0a42776a762:/backend# curl http://riesgos-wps:8080

<html><body><a href="/wps-js-client">WPS JS Client</a><br><a href="/wps/WebProcessingService?Request=GetCapabilities&Service=WPS">WebProcessingService</a></body></html>

root@e0a42776a762:/backend# curl http://riesgos-wps:8080/wps

root@e0a42776a762:/backend# curl http://riesgos-wps:8080/wps/WebProcessingService

<!doctype html><html lang="en"><head><title>HTTP Status 404 – Not Found</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 404 – Not Found</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Description</b> The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.</p><hr class="line" /><h3>Apache Tomcat/9.0.65</h3></body></html>

root@e0a42776a762:/backend# 
MichaelLangbein commented 1 year ago

To reproduce:

The process will fail on a first attempt with " connect ECONNREFUSED 127.0.0.1:8080" and on all subsequent attempts (even after wps is fully booted up) with "404".

Note:

MichaelLangbein commented 1 year ago

Riesgos-wps outputs the following error-message:

30-May-2023 13:28:01.257 INFO [GuavaAuthCache-0-1] org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading Illegal access: this web application instance has been stopped already. Could not load [com.google.common.cache.RemovalCause]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
        java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [com.google.common.cache.RemovalCause]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
                at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1432)
                at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForClassLoading(WebappClassLoaderBase.java:1420)
                at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1259)
                at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1220)
                at com.google.common.cache.LocalCache$Segment.expireEntries(LocalCache.java:2622)
                at com.google.common.cache.LocalCache$Segment.runLockedCleanup(LocalCache.java:3446)
                at com.google.common.cache.LocalCache$Segment.cleanUp(LocalCache.java:3438)
                at com.google.common.cache.LocalCache.cleanUp(LocalCache.java:3858)
                at com.google.common.cache.LocalCache$LocalManualCache.cleanUp(LocalCache.java:4797)
                at org.geoserver.security.auth.GuavaAuthenticationCacheImpl$1.run(GuavaAuthenticationCacheImpl.java:65)
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:750)
MichaelLangbein commented 1 year ago

Seems like we need to make sure riesgos-wps is completely booted before sending requests to it. A few ways of doing this:

MichaelLangbein commented 1 year ago

Ok, latest version of backend now has a mechanism where a user can define tests that must pass before the API is available. That should solve this issue. Also, since the build-script currently automatically fetches the dev-branch of the backend, this should immediately appy after a rebuild.