power loss service@0.16.6
octue: 0.43.5
wake-service: 0.9.4
octue: 0.46.2
calculation status: working
power loss service@0.16.10
octue: 0.47.0
wake-service: 0.9.7
octue: 0.46.2
calculation status: working until release of 0.16.11 (can be due to another reason but this is the most obvious one)
power loss service@0.16.11
change: Updated to use poetry, automatically select revision
octue: 0.47.1
wake-service: auto (0.9.8 was latest at the time using octue 0.46.2)
calculation status: doesn't work. Fails with a monitor message schema error 'CannotDetermineSpecification'. See exception block in https://api.windquest.app/admin/db/projects/powerlossquestion/493519e9-e0dd-404e-8702-d38c775312ac/change/
Note: All references to versions starting with 0.16.x are power loss versions and those with 0.9.x are wake service versions if I forget to explicitly mention that.
The last known working state was 0.16.10 before the release of 0.16.11 last night.
Today, I get reports of calculations that were never completed (remain in in-progress forever) and this usually happens when wake service crashes. Checked the CR (cloud run) logs and it was the CannotDetermineSpecification error.
So, I reverted to 0.16.10 (setting it as default service revision in WQ) but retriggered questions were never completed as before.
Checked the Cloudrun log and saw a bunch of details = "Resource not found (resource=octue.services.windpioneers.wake-service.0-9-8.answers.uuid)" errors (note: 0.9.8 in the topic). This was odd because 0.16.10 had a hardcoded version of wake service 0.9.7. This was the time I raised this issue in the work chat group.
In the WQ logs, I see that the right version of wake service is being called.
INFO service.py:335 <Service('windpioneers/power-loss-service:0.16.10')> asked a question 'fef885e2-194e-420b-97dc-3792e3d1d28c' to service 'windpioneers/wake-service:0-9-7'.
But the cloud run logs show that it's looking for 0.9.8's answer topics.
[ERROR | google.cloud.pubsub_v1.publisher._batch.thread] [analysis-fef885e2-194e-420b-97dc-3792e3d1d28c] Failed to publish 1 messages.
Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 72, in error_remapped_callable return callable_(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/grpc/_c
status = StatusCode.NOT_FOUND
details = "Resource not found (resource=octue.services.windpioneers.wake-service.0-9-8.answers.fef885e2-194e-420b-97dc-3792e3d1d28c)."
Either, 0.9.7 is looking for the wrong topic or pl 0.16.10 is calling the wake service 0.9.8 instead of 0.9.7 as it is saying. The latter felt like the more likely reason (at the time) because of the latest service revision feature.
I figured there was some issue with picking the right revision from the service registry so deleted the 0.9.8 service revision registered in WQ (so there were no wake service versions registered). Retriggered a question and it showed the same error message (looking for 0.9.8 topic).
So, I created a service revision 0.9.7 in WQ and set that as the default in case it's getting the latest revision all the time. Retriggered a question but wake service gave the same wrong topic error.
I tried removing the 0.9.8 tag from the released revision on cloud run. Retriggered and got the same error.
At this point, I came to the sad realization that I was in over my head and reverted to version 0.16.6 (0.16.8 and 0.16.9 are broken, 0.16.7 never registered) of power loss. I knew this worked previously and was using an older version of octue without the service registry feature. This works but the service is missing some important updates that were made in newer versions.
I recreated service revision in WQ for wakes 0.9.8 (didn't make this default though) and added the 0.9.8 tag back to the cloud run revision.
Bug report
What is the current behavior?
Different versions of power loss and specs:
Note: All references to versions starting with 0.16.x are power loss versions and those with 0.9.x are wake service versions if I forget to explicitly mention that.
The last known working state was 0.16.10 before the release of 0.16.11 last night.
Today, I get reports of calculations that were never completed (remain in in-progress forever) and this usually happens when wake service crashes. Checked the CR (cloud run) logs and it was the
CannotDetermineSpecification
error.So, I reverted to 0.16.10 (setting it as default service revision in WQ) but retriggered questions were never completed as before. Checked the Cloudrun log and saw a bunch of
details = "Resource not found (resource=octue.services.windpioneers.wake-service.0-9-8.answers.uuid)"
errors (note: 0.9.8 in the topic). This was odd because 0.16.10 had a hardcoded version of wake service 0.9.7. This was the time I raised this issue in the work chat group.In the WQ logs, I see that the right version of wake service is being called.
But the cloud run logs show that it's looking for 0.9.8's answer topics.
Either, 0.9.7 is looking for the wrong topic or pl 0.16.10 is calling the wake service 0.9.8 instead of 0.9.7 as it is saying. The latter felt like the more likely reason (at the time) because of the
latest
service revision feature.I figured there was some issue with picking the right revision from the service registry so deleted the 0.9.8 service revision registered in WQ (so there were no wake service versions registered). Retriggered a question and it showed the same error message (looking for 0.9.8 topic).
So, I created a service revision 0.9.7 in WQ and set that as the default in case it's getting the latest revision all the time. Retriggered a question but wake service gave the same wrong topic error.
I tried removing the 0.9.8 tag from the released revision on cloud run. Retriggered and got the same error.
At this point, I came to the sad realization that I was in over my head and reverted to version 0.16.6 (0.16.8 and 0.16.9 are broken, 0.16.7 never registered) of power loss. I knew this worked previously and was using an older version of octue without the service registry feature. This works but the service is missing some important updates that were made in newer versions.
I recreated service revision in WQ for wakes 0.9.8 (didn't make this default though) and added the 0.9.8 tag back to the cloud run revision.