kvaps / kube-linstor

Containerized LINSTOR SDS for Kubernetes, ready for production use.
Apache License 2.0
130 stars 25 forks source link

linstor 1.12.2 K8s 1.21 : cannot create pv #39

Closed fondemen closed 3 years ago

fondemen commented 3 years ago

Hello, After upgrading to linstor 1.12, I can't create new pv. Here are some interesting logs from linstor-controller :

INFO: [HttpServer-1] Started.
11:35:37.611 [Main] INFO  LINSTOR/Controller - SYSTEM - Controller initialized
11:35:42.445 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Could not set object '[]' of type String as SQL type: 2005 (CLOB) for column RESOURCE_GROUPS.NODE_NAME_LIST [Report number 609282FD-00000-000000]
11:35:42.640 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Could not set object '[]' of type String as SQL type: 2005 (CLOB) for column RESOURCE_GROUPS.NODE_NAME_LIST [Report number 609282FD-00000-000001]
11:35:42.717 [grizzly-http-server-1] WARN  LINSTOR/Controller - SYSTEM - Path '/v1/resource-definitions//resources' not found on server.
11:35:42.756 [grizzly-http-server-0] WARN  LINSTOR/Controller - SYSTEM - Path '/v1/resource-definitions//resources' not found on server.
11:35:53.460 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Could not set object '[]' of type String as SQL type: 2005 (CLOB) for column RESOURCE_GROUPS.NODE_NAME_LIST [Report number 609282FD-00000-000002]
11:35:53.545 [grizzly-http-server-0] WARN  LINSTOR/Controller - SYSTEM - Path '/v1/resource-definitions//resources' not found on server.
11:35:59.185 [grizzly-http-server-1] ERROR LINSTOR/Controller - SYSTEM - Could not set object '[]' of type String as SQL type: 2005 (CLOB) for column RESOURCE_GROUPS.NODE_NAME_LIST [Report number 609282FD-00000-000003]
11:35:59.252 [grizzly-http-server-0] ERROR LINSTOR/Controller - SYSTEM - Could not set object '[]' of type String as SQL type: 2005 (CLOB) for column RESOURCE_GROUPS.NODE_NAME_LIST [Report number 609282FD-00000-000004]
...

What is weird is that I have no problem exploring existing resources using the linstor cli, or creating new ones... However, I can't create a new ressource group :

$ linstor rg c test
ERROR:
Description:
    Creation of resource group 'test' failed due to an unknown exception.
Details:
    Resource group: test
Show reports:
    linstor error-reports show 609282FD-00000-000042
command terminated with exit code 10

Any hint?

Cheers

kvaps commented 3 years ago

I'm not sure, but looks like upstream issue. I was checking this version before the release it was working fine to me, but currently I'm using Kubernetes v1.20.

Any way could you provide the detailed bug reports from the linstor-controller. You can find them on one of the linstor-controller container in /logs/ErrorReport-609282FD-00000-*.log or directly on the node in /var/log/linstor-controller/ErrorReport-609282FD-00000-*.log

fondemen commented 3 years ago

Thanks for your answer. Here is the gist: https://gist.github.com/fondemen/a69c719a42274c9acb340ab4a76cc990 I fear there is not much more there : class cast exception from String to CLOB.

I've check with backups, the resource_groups::node_name_list was changed from character varying(4096) DEFAULT '[]'::character varying to text DEFAULT '[]'::character varying ; not sure whether that's the problem...

kvaps commented 3 years ago

You're using NODE_LIST. I suppose you have the similar problem like I faced on v1.12.1 with REPLICAS_ON_SAME and REPLICAS_ON_DIFFERENT (see https://github.com/LINBIT/linstor-server/issues/230)

Please report this bug to the upstream project: https://github.com/LINBIT/linstor-server

fondemen commented 3 years ago

Thanks for your response. Issue mentionned here LINBIT/linstor-server#231. It also happens on a brand new K8s 1.20.5 / linstor 1.12.2. I'm not using your new auto-join feature (creating lvm volumes or joining nodes by hand). What's the version of your Postgres ?

kvaps commented 3 years ago

I use stolon v0.16.0 from this chart

# postgres --version
postgres (PostgreSQL) 10.12 (Debian 10.12-1.pgdg90+1)
fondemen commented 3 years ago

Problem solved with 1.12.3 ! Thanks a lot !