Closed dsavchenko closed 3 months ago
It works, yes. Although it takes a bit of time.
That's very weird, but in my deployment the startupProbe fails constantly with "connection refused", but if I kubectl edit
the corresponding deployment in-place, even something unrelated to the startup probe, e.g. the failureThreshold of the livenessProbe, the new pod is created and starts without any problem.
Any ideas, what could it be or where to look to?
Hm, I finally managed to extract at least some log from the unhealthy pod
Traceback (most recent call last):
File "/opt/conda/bin/nb2service", line 5, in <module>
from nb2workflow.service import main
File "/opt/conda/lib/python3.9/site-packages/nb2workflow/service.py", line 14, in <module>
from nb2workflow import ontology, publish, schedule
File "/opt/conda/lib/python3.9/site-packages/nb2workflow/ontology.py", line 6, in <module>
import nb2workflow.nbadapter as nbadapter
File "/opt/conda/lib/python3.9/site-packages/nb2workflow/nbadapter.py", line 46, in <module>
from nb2workflow import workflows
File "/opt/conda/lib/python3.9/site-packages/nb2workflow/workflows.py", line 15, in <module>
cache = Cache('.nb2workflow/cache')
File "/opt/conda/lib/python3.9/site-packages/diskcache/core.py", line 478, in __init__
self.reset(key, value, update=False)
File "/opt/conda/lib/python3.9/site-packages/diskcache/core.py", line 2431, in reset
((old_value,),) = sql(
sqlite3.OperationalError: database is locked
As I use nfs volumes with ReadWriteMany as workdir, it refuses to start because the cache is locked by the previously running container. This doesn't explain why it starts after editing deployment, though
As I use nfs volumes with ReadWriteMany as workdir, it refuses to start because the cache is locked by the previously running container. This doesn't explain why it starts after editing deployment, though
Not this...
Well, I think the problem is that in my installation persistent volumes are NFS and sqlite doesn't work well with it.
@volodymyrss what is this cache about, and what's this workflows
module is in general?
Seems we only use serialize_workflow_exception from there, so may probably be moved somewhere else
It provides a homogeneous way to run workflows/tools as local nb files or as requests to different services. It was used e.g. in tests and when notebooks call other notebooks. This sort of functionality is needed, but it may not be like that and here. If this cache is a source of the issue, you can disable it.
On staging, even after https://github.com/oda-hub/nb2workflow/pull/180
@volodymyrss does it deploy well in prod?