Closed mikliapko closed 3 weeks ago
This is a blocker for manager releases.
the test is hardcoding
ami_monitor_user: 'centos'
ami_id_monitor: 'ami-02eac2c0129f6376b' # Official CentOS Linux 7 x86_64 HVM EBS ENA 1901_01
1) centos7 that's deprecated
2) now on AWS/GCP if monitor is replace with an image which isn't monitor, you need to specify the monitor_branch
, that can be a work around, for the manager jobs
the test is hardcoding
ami_monitor_user: 'centos' ami_id_monitor: 'ami-02eac2c0129f6376b' # Official CentOS Linux 7 x86_64 HVM EBS ENA 1901_01
- centos7 that's deprecated
- now on AWS/GCP if monitor is replace with an image which isn't monitor, you need to specify the
monitor_branch
, that can be a work around, for the manager jobs
Would it help if I remove hardcodes for ami_monitor_user
and ami_id_monitor
in manager yaml configs?
Will it go the formal_monitor_image
flow in such case?
@fruch
the test is hardcoding
ami_monitor_user: 'centos' ami_id_monitor: 'ami-02eac2c0129f6376b' # Official CentOS Linux 7 x86_64 HVM EBS ENA 1901_01
- centos7 that's deprecated
- now on AWS/GCP if monitor is replace with an image which isn't monitor, you need to specify the
monitor_branch
, that can be a work around, for the manager jobsWould it help if I remove hardcodes for
ami_monitor_user
andami_id_monitor
in manager yaml configs? Will it go theformal_monitor_image
flow in such case?@fruch
Yes, but it won't be CentOS anymore, it would be Ubuntu.(So you might need to rename the job, and the triggers to pass .list and not a .repo)
Probably for the long run, we should split the manager server from monitoring node.
Yes, but it won't be CentOS anymore, it would be Ubuntu.(So you might need to rename the job, and the triggers to pass .list and not a .repo)
Thanks, got it, I'll prepare the fixes then.
Probably for the long run, we should split the manager server from monitoring node.
Yep, it would be good to do it.
@fruch Could you please take a look?
I adjusted configuration files, executing the job with these changes.
It fails on monitor node setup stage with the error NodeSetupFailed: Wait for: manager-regression-manager--monitor-node-3f072230-1: Waiting for manager server to be up: timeout - 300 seconds - expired
.
At the same time I can access the node via ssh from my local PC.
Argus - https://argus.scylladb.com/workspace?state=WyI2YWI1YjgzNy1mMDUxLTQyNjgtODI3Ni01NWU0NGEyYTUxN2QiXQ (see the latest run)
@fruch Could you please take a look?
I adjusted configuration files, executing the job with these changes. It fails on monitor node setup stage with the error
NodeSetupFailed: Wait for: manager-regression-manager--monitor-node-3f072230-1: Waiting for manager server to be up: timeout - 300 seconds - expired
. At the same time I can access the node via ssh from my local PC.Argus - https://argus.scylladb.com/workspace?state=WyI2YWI1YjgzNy1mMDUxLTQyNjgtODI3Ni01NWU0NGEyYTUxN2QiXQ (see the latest run)
It's says it's waiting for the manger to be up, check the logs to see why it's not up
May 07 15:41:58 manager-regression-manager--monitor-node-154fed3a-1 scylla-manager[23054]: STARTUP ERROR: configuration ["/etc/scylla-manager/scylla-manager.yaml"]: yaml: unmarshal errors:
May 07 15:41:58 manager-regression-manager--monitor-node-154fed3a-1 scylla-manager[23054]: line 1: field config_cache not found in type server.Config
@mikliapko The error from manager log says that it read config_cache
entry from scylla-manager config, but it doesn't match the real object definition. config_cache
in scylla-manager.yaml is something I introduced to SCT repo on my extend_sleep_TLS_en
branch (it's not merged to upstream).
It should work if you test it against this manager build https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/675/artifact/00-Build.txt
@mikliapko The error from manager log says that it read
config_cache
entry from scylla-manager config, but it doesn't match the real object definition.config_cache
in scylla-manager.yaml is something I introduced to SCT repo on myextend_sleep_TLS_en
branch (it's not merged to upstream). It should work if you test it against this manager build https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/675/artifact/00-Build.txt
@karol-kokoszka I think this job used the build you mentioned (https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/mikita/job/sct/job/manager-ubuntu-22-sanity/4/parameters/) - 2024-05-02T21:58:02Z.
This is what I see in parameters:
https://downloads.scylladb.com/manager/deb/unstable/unified-deb/master/latest/scylla-manager.list
It points to master latest.
This is what I see in parameters:
https://downloads.scylladb.com/manager/deb/unstable/unified-deb/master/latest/scylla-manager.list
It points to master latest.
As I see the latest version in master is downloads.scylladb.com/manager/deb/unstable/unified-deb/master/2024-05-02T21:58:02Z
, the same version specified in 00-Build.txt file you mentioned, isn't it?
.....maybe :) Let me trigger the build from the expected branch once again. So it will be from today.
.....maybe :) Let me trigger the build from the expected branch once again. So it will be from today.
Good, please let me know then, I'll restart the job with the fresh build
I just triggered manager-master build pointing to feature-brach_config_cache_service
branch. This is the branch I must validate with SCT before merging.
https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/676/
Issue description
Manager tests fail on monitor node setup stage:
I suspect the issue was introduced here https://github.com/scylladb/scylla-cluster-tests/commit/aaca6347260cc8543431ce9761e36ffbe82e465d by setting
monitor_branch: null
.Impact
All manager tests running from master branch are broken. We need to take a look into it ASAP.
How frequently does it reproduce?
Always.
Installation details
SCT Version: 3ab8f43225bb69ee9f0a3eddadf6f5e7bc29b469
Logs