Closed fernflower closed 1 year ago
Please note that every PR needs to comply with the Leapp Guidelines and must pass all tests in order to be mergeable. If you want to request a review or rebuild a package in copr, you can use following commands as a comment:
To launch regression testing public members of oamg organization can leave the following comment:
Please open ticket in case you experience technical problem with the CI. (RH internal only)
Note: In case there are problems with tests not being triggered automatically on new PR/commit or pending for a long time, please consider rerunning the CI by commenting leapp-ci build (might require several comments). If the problem persists, contact leapp-infra.
/rerun
Copr build succeeded: https://copr.fedorainfracloud.org/coprs/build/6020934
Testing Farm request for RHEL-8.6-rhui/5990856;6020934 regression testing has been created. Once finished, results should be available here. Full pipeline log.
Testing Farm request for RHEL-7.9-rhui/5990856;6020934 regression testing has been created. Once finished, results should be available here. Full pipeline log.
/rerun
Copr build succeeded: https://copr.fedorainfracloud.org/coprs/build/6091405
Testing Farm request for RHEL-8.6-rhui/6080984;6091405 regression testing has been created. Once finished, results should be available here. Full pipeline log.
Testing Farm request for RHEL-7.9-rhui/6080984;6091405 regression testing has been created. Once finished, results should be available here. Full pipeline log.
LGTM, although I agree with the suggestion from @pirat89 in previous comment - changing the immediate CLI output to something similar to the StopActorExecutionError
would be nice.
Tried on rhel7 vm by raising an exception in scancpu
actor.
Leapp report:
/var/log/leapp/leapp-report.txt
contains expected output:
Risk Factor: high
Title: Actor scancpu unexpectedly terminated with exit code: 1
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 74, in _do_run
actor_instance.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/leapp/actors/__init__.py", line 289, in run
self.process(*args)
File "/usr/share/leapp-repository/repositories/system_upgrade/common/actors/scancpu/actor.py", line 16, in process
scancpu.process()
File "/usr/share/leapp-repository/repositories/system_upgrade/common/actors/scancpu/libraries/scancpu.py", line 142, in process
raise Exception("Testing exception")
Exception: Testing exception
Summary:
Key: d6c2efd3b09890ac78b14f0fb428e5236fe1c36c
----------------------------------------
Database:
leapp-inspector messages --actor=scancpu
:
######################################################################
PRODUCED MESSAGES
######################################################################
Stamp: 2023-06-19T11:37:56.658310Z
Actor: scancpu
Phase: FactsCollection
Type: ErrorModel
Message_data:
{
"actor": "scancpu",
"details": null,
"message": "Actor scancpu unexpectedly terminated with exit code: 1\nTraceback (most recent call last):\n File \"/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py\", line 74, in _do_run\n actor_instance.run(*args, **kwargs)\n File \"/usr/lib/python2.7/site-packages/leapp/actors/__init__.py\", line 289, in run\n self.process(*args)\n File \"/usr/share/leapp-repository/repositories/system_upgrade/common/actors/scancpu/actor.py\", line 16, in process\n scancpu.process()\n File \"/usr/share/leapp-repository/repositories/system_upgrade/common/actors/scancpu/libraries/scancpu.py\", line 142, in process\n raise Exception(\"Testing exception\")\nException: Testing exception\n",
"severity": "fatal",
"time": "2023-06-19T11:37:56.657966Z"
}
######################################################################
leapp-inspector actors --actor=scancpu
Actor: scancpu
Executed: True
Phase: FactsCollection
Started: 2023-06-19T11:37:56.593592Z
Produced messages:
- ErrorModel
Executed shell commands:
- ['lscpu']
Logs:
--- {"message": "2023-06-19 11:37:56.627 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: External command has started: ['lscpu']", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.633 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Architecture: x86_64", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.634 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: CPU op-mode(s): 32-bit, 64-bit", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.635 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Byte Order: Little Endian", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.636 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: CPU(s): 2", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.637 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: On-line CPU(s) list: 0,1", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.638 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Thread(s) per core: 1", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.639 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Core(s) per socket: 1", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.639 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Socket(s): 2", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.640 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: NUMA node(s): 1", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.641 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Vendor ID: GenuineIntel", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.642 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: CPU family: 6", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.642 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Model: 142", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.643 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Model name: Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.644 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Stepping: 12", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.645 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: CPU MHz: 2303.996", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.645 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: BogoMIPS: 4607.99", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.646 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Virtualization: VT-x", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.647 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Hypervisor vendor: KVM", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.648 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Virtualization type: full", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.648 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: L1d cache: 32K", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.649 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: L1i cache: 32K", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.650 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: L2 cache: 4096K", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.650 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: L3 cache: 16384K", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.651 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: NUMA node0 CPU(s): 0,1", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.652 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 arat umip md_clear spec_ctrl intel_stibp arch_capabilities", "level": "DEBUG"}
--- {"message": "2023-06-19 11:37:56.653 DEBUG PID: 8240 leapp.workflow.FactsCollection.scancpu: External command has finished: ['lscpu']", "level": "DEBUG"}
I'd suggest to add an ERROR log as well, if that makes sense and would be doable.
@PeterMocary @pirat89 Thanks for the review! If I understand you guys correctly, you would like to wrap the hard crash into a leapp exception and put the traceback in the details, right? (I honestly prefer the hard crash way, as any actor crashing is a severe thing that has to be as obvious as possible, but I will implement it the way you suggest and we'll see :)
@fernflower In case of the report I am more wondering WDYT about putting it into the report like this:
Risk Factor: high
Title: Actor scancpu unexpectedly terminated with exit code: 1
Summary:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 74, in _do_run
actor_instance.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/leapp/actors/__init__.py", line 289, in run
self.process(*args)
File "/usr/share/leapp-repository/repositories/system_upgrade/common/actors/scancpu/actor.py", line 16, in process
scancpu.process()
File "/usr/share/leapp-repository/repositories/system_upgrade/common/actors/scancpu/libraries/scancpu.py", line 142, in process
raise Exception("Testing exception")
Exception: Testing exception
Key: d6c2efd3b09890ac78b14f0fb428e5236fe1c36c
----------------------------------------
######################################################################
PRODUCED MESSAGES
######################################################################
Stamp: 2023-06-19T11:37:56.658310Z
Actor: scancpu
Phase: FactsCollection
Type: ErrorModel
Message_data:
{
"actor": "scancpu",
"details": "{'traceback': 'Traceback (most recent call last):\n File \"/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py\", line 74, in _do_run\n actor_instance.run(*args, **kwargs)\n File \"/usr/lib/python2.7/site-packages/leapp/actors/__init__.py\", line 289, in run\n self.process(*args)\n File \"/usr/share/leapp-repository/repositories/system_upgrade/common/actors/scancpu/actor.py\", line 16, in process\n scancpu.process()\n File \"/usr/share/leapp-repository/repositories/system_upgrade/common/actors/scancpu/libraries/scancpu.py\", line 142, in process\n raise Exception(\"Testing exception\")\nException: Testing exception\n'}"
"message": "Actor scancpu unexpectedly terminated with exit code: 1",
"severity": "fatal",
"time": "2023-06-19T11:37:56.657966Z"
}
######################################################################
I mean to keep the title as shorter msg and use details/summary/... for the rest. But as I told I am ok with this solution also. Btw, for the console output it's same as before:
Failed to get unit file state for ip6tables.service: No such file or directory
2023-06-20 13:16:00.228 DEBUG PID: 3771 leapp.workflow.FactsCollection.system_facts: Command ['systemctl', 'is-enabled', 'ip6tables'] failed with exit code 1.
2023-06-20 13:16:00.230 DEBUG PID: 3771 leapp.workflow.FactsCollection.system_facts: External command has finished: ['systemctl', 'is-enabled', 'ip6tables']
2023-06-20 13:16:00.231 DEBUG PID: 3771 get_firewalld_status: The ip6tables service is likely not enabled nor running
Process Process-220:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 74, in _do_run
actor_instance.run(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/leapp/actors/__init__.py", line 289, in run
self.process(*args)
File "/usr/share/leapp-repository/repositories/system_upgrade/common/actors/systemfacts/actor.py", line 62, in process
if not architecture.matches_architecture(architecture.ARCH_S390X).mooo():
AttributeError: 'bool' object has no attribute 'mooo'
=============================================================================================
Actor system_facts unexpectedly terminated with exit code: 1 - Please check the above details
=============================================================================================
I actually realized the "banner" statement just now :-D It's cool.
@fernflower @PeterMocary btw, I've realized that for error models in the report.txt file, we do not print that it's error, inhibitor, ... it looks just a high severity msg. I am thinking we should improve it in future somehow to make it clear that it's error, but I do not want to change the structure/schema because of that.
@PeterMocary @pirat89 Thanks for the review! If I understand you guys correctly, you would like to wrap the hard crash into a leapp exception and put the traceback in the details, right? (I honestly prefer the hard crash way, as any actor crashing is a severe thing that has to be as obvious as possible, but I will implement it the way you suggest and we'll see :)
Just realized you summarized that right! Sorry, my brain is slow today :/
/rerun
Copr build succeeded: https://copr.fedorainfracloud.org/coprs/build/6107459
Testing Farm request for RHEL-8.6-rhui/6096363;6107459 regression testing has been created. Once finished, results should be available here. Full pipeline log.
Testing Farm request for RHEL-7.9-rhui/6096363;6107459 regression testing has been created. Once finished, results should be available here. Full pipeline log.
@PeterMocary @pirat89 All your comments should be resolved - moved trace to the message details (so that a nice Summary with shorter message name will be shown in the report), error is logged at the time of the crash as well. Could you please re-review? (leapp-report.txt)
----------------------------------------
Risk Factor: high
Title: Actor bacula_check unexpectedly terminated with exit code: 1
Summary: Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/leapp/repository/actor_definition.py", line 74, in _do_run
actor_instance.run(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/leapp/actors/__init__.py", line 289, in run
self.process(*args)
File "/etc/leapp/repos.d/system_upgrade/common/actors/baculacheck/actor.py", line 20, in process
42 / 0
ZeroDivisionError: division by zero
Key: 4117a75983f26b1c6987c1bcc84ac31f9a2d8480
(except from leapp-upgrade.log)
2023-06-23 09:09:23.81 INFO PID: 235468 leapp.workflow.Checks: Executing actor bacula_check
2023-06-23 09:09:23.131 ERROR PID: 235468 leapp.workflow.Checks: Actor bacula_check has crashed: Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/leapp/repository/actor_definition.py", line 74, in _do_run
actor_instance.run(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/leapp/actors/__init__.py", line 289, in run
self.process(*args)
File "/etc/leapp/repos.d/system_upgrade/common/actors/baculacheck/actor.py", line 20, in process
42 / 0
ZeroDivisionError: division by zero
@fernflower thanks! lgtm and tests passed. Merging.
Now when an exception is thrown during actor execution it won't be shown only on stdout, but will appear in leapp.db and leapp report as high risk report message.
OAMG-4186