nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
2 stars 0 forks source link

Openstack neutron DB errors #364

Open aabaris opened 10 months ago

aabaris commented 10 months ago

Log errors encountered:

[root@nerc-ctl-0 containers]# grep -r ERROR | grep 9cf7d432f0ed54e3c55a43984e0fe27273547537043cfdcee0857c1017d21120 neutron/server.log:2024-01-05 09:43:52.328 18 ERROR neutron.plugins.ml2.ovo_rpc [req-3b255338-00e6-452d-a310-19202807b50d 9cf7d432f0ed54e3c55a43984e0fe27273547537043cfdcee0857c1017d21120 90dce24f8e6748bfba319a9223d0a7a6 - - -] Exception while dispatching port events: 'Chassis_Private' object has no attribute 'hostname': AttributeError: 'Chassis_Private' object has no attribute 'hostname'

Potential fix: https://access.redhat.com/solutions/7032696

Need to investigate and if applicable resolve.

jtriley commented 10 months ago

Just noting I'm able to reproduce the openstack network agent list 500 error mentioned in the redhat solution. That said, about every 4th or 5th try it succeeds. I noticed that currently these network agents are down:

$ openstack network agent list -f value | grep -i false
d3511a66-8daa-443a-8c07-0c6c21224603 OVN Controller agent nerc-hyp-23.rc.fas.harvard.edu  False True ovn-controller
66822378-638d-446f-9877-e5986a230629 OVN Controller agent nerc-hyp-14.rc.fas.harvard.edu  False True ovn-controller
869fdbb1-efa2-4e3f-9a16-91235abd5968 OVN Controller agent nerc-hyp-15.rc.fas.harvard.edu  False True ovn-controller
38ec7f76-ca3a-4608-9687-ea29027f1dc4 OVN Controller agent nerc-hyp-29.rc.fas.harvard.edu  False True ovn-controller
StHeck commented 10 months ago

Yes, they are.

nerc-hyp-14 and nerc-hyp-15 are Lenovos that seem to be powered down.

nerc-hyp-29 is an SMC that is up but I can't ping it.

nerc-hyp-23 does not show up in the server list:

(undercloud) @.*** ~]$ openstack server list | grep nerc-hyp-23

(undercloud) @.*** ~]$


From: Justin Riley @.> Sent: Wednesday, January 17, 2024 2:59 PM To: nerc-project/operations @.> Cc: Heckman, Stephen @.>; Assign @.> Subject: Re: [nerc-project/operations] Openstack neutron DB errrors (Issue #364)

Just noting I'm able to reproduce the openstack network agent list 500 error mentioned in the redhat solution. That said, about every 4th or 5th try it succeeds. I noticed that currently these network agents are down:

$ openstack network agent list -f value | grep -i false d3511a66-8daa-443a-8c07-0c6c21224603 OVN Controller agent nerc-hyp-23.rc.fas.harvard.edu False True ovn-controller 66822378-638d-446f-9877-e5986a230629 OVN Controller agent nerc-hyp-14.rc.fas.harvard.edu False True ovn-controller 869fdbb1-efa2-4e3f-9a16-91235abd5968 OVN Controller agent nerc-hyp-15.rc.fas.harvard.edu False True ovn-controller 38ec7f76-ca3a-4608-9687-ea29027f1dc4 OVN Controller agent nerc-hyp-29.rc.fas.harvard.edu False True ovn-controller

— Reply to this email directly, view it on GitHubhttps://github.com/nerc-project/operations/issues/364#issuecomment-1896593951, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACODMWC2NH76HTOSGSKCH33YPAUQ5AVCNFSM6AAAAABBTF4UYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJWGU4TGOJVGE. You are receiving this because you were assigned.Message ID: @.***>

StHeck commented 10 months ago

Maybe nerc-hyp-23 was the server that was deleted using the ovn-sbctl command?


From: Heckman, Stephen @.> Sent: Wednesday, January 17, 2024 5:31 PM To: nerc-project/operations @.>; nerc-project/operations @.> Cc: Assign @.> Subject: Re: [nerc-project/operations] Openstack neutron DB errrors (Issue #364)

Yes, they are.

nerc-hyp-14 and nerc-hyp-15 are Lenovos that seem to be powered down.

nerc-hyp-29 is an SMC that is up but I can't ping it.

nerc-hyp-23 does not show up in the server list:

(undercloud) @.*** ~]$ openstack server list | grep nerc-hyp-23

(undercloud) @.*** ~]$


From: Justin Riley @.> Sent: Wednesday, January 17, 2024 2:59 PM To: nerc-project/operations @.> Cc: Heckman, Stephen @.>; Assign @.> Subject: Re: [nerc-project/operations] Openstack neutron DB errrors (Issue #364)

Just noting I'm able to reproduce the openstack network agent list 500 error mentioned in the redhat solution. That said, about every 4th or 5th try it succeeds. I noticed that currently these network agents are down:

$ openstack network agent list -f value | grep -i false d3511a66-8daa-443a-8c07-0c6c21224603 OVN Controller agent nerc-hyp-23.rc.fas.harvard.edu False True ovn-controller 66822378-638d-446f-9877-e5986a230629 OVN Controller agent nerc-hyp-14.rc.fas.harvard.edu False True ovn-controller 869fdbb1-efa2-4e3f-9a16-91235abd5968 OVN Controller agent nerc-hyp-15.rc.fas.harvard.edu False True ovn-controller 38ec7f76-ca3a-4608-9687-ea29027f1dc4 OVN Controller agent nerc-hyp-29.rc.fas.harvard.edu False True ovn-controller

— Reply to this email directly, view it on GitHubhttps://github.com/nerc-project/operations/issues/364#issuecomment-1896593951, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACODMWC2NH76HTOSGSKCH33YPAUQ5AVCNFSM6AAAAABBTF4UYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJWGU4TGOJVGE. You are receiving this because you were assigned.Message ID: @.***>

aabaris commented 9 months ago

Maybe nerc-hyp-23 was the server that was deleted using the ovn-sbctl command? ____ From: Heckman, Stephen @.> Sent: Wednesday, January 17, 2024 5:31 PM To: nerc-project/operations @.>; nerc-project/operations @.> Cc: Assign @.> Subject: Re: [nerc-project/operations] Openstack neutron DB errrors (Issue #364) Yes, they are. nerc-hyp-14 and nerc-hyp-15 are Lenovos that seem to be powered down. nerc-hyp-29 is an SMC that is up but I can't ping it. nerc-hyp-23 does not show up in the server list: (undercloud) @. ~]$ openstack server list | grep nerc-hyp-23 (undercloud) @. ~]$ ____ From: Justin Riley @.> Sent: Wednesday, January 17, 2024 2:59 PM To: nerc-project/operations @.> Cc: Heckman, Stephen @.>; Assign @.> Subject: Re: [nerc-project/operations] Openstack neutron DB errrors (Issue #364) Just noting I'm able to reproduce the openstack network agent list 500 error mentioned in the redhat solution. That said, about every 4th or 5th try it succeeds. I noticed that currently these network agents are down: $ openstack network agent list -f value | grep -i false d3511a66-8daa-443a-8c07-0c6c21224603 OVN Controller agent nerc-hyp-23.rc.fas.harvard.edu False True ovn-controller 66822378-638d-446f-9877-e5986a230629 OVN Controller agent nerc-hyp-14.rc.fas.harvard.edu False True ovn-controller 869fdbb1-efa2-4e3f-9a16-91235abd5968 OVN Controller agent nerc-hyp-15.rc.fas.harvard.edu False True ovn-controller 38ec7f76-ca3a-4608-9687-ea29027f1dc4 OVN Controller agent nerc-hyp-29.rc.fas.harvard.edu False True ovn-controller — Reply to this email directly, view it on GitHub<#364 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACODMWC2NH76HTOSGSKCH33YPAUQ5AVCNFSM6AAAAABBTF4UYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJWGU4TGOJVGE. You are receiving this because you were assigned.Message ID: @.***>

nerc-hyp-23 is not deployed in the overcloud. the according to ovn-sbdb Chassis_Private object that references it was created on 2022/03/29