teemtee / tmt

Test Management Tool
MIT License
84 stars 125 forks source link

oom-kill tests error when executed through tmt. Need to mimic beaker harness protection. #1904

Open sbertramrh opened 1 year ago

sbertramrh commented 1 year ago

Current logged results when run through tmt:

18:02:03                 out: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
18:02:03                 out: ::   Test
18:02:03                 out: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
18:02:03                 out: 
18:02:04                 out: :: [ 13:02:03 ] :: [  BEGIN   ] :: Running 'gcc -o oomtest oomtest.c'
18:02:05                 out: oomtest.c:59:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
18:02:05                 out:    59 | oom_adj(int adj){
18:02:05                 out:       | ^~~~~~~
18:02:05                 out: :: [ 13:02:05 ] :: [   PASS   ] :: Command 'gcc -o oomtest oomtest.c' (Expected 0, got 0)
18:02:05                 out: :: [ 13:02:05 ] :: [  BEGIN   ] :: Running oom triggerring program :: actually running './oomtest'
18:02:05                 out: oomtest[117130]:   0 oomtest lead         started, requesting               0, total               0
18:02:05                 out: oomtest[117131]:   1 fork_child           started, requesting      1314324480, total      1314324480
18:02:10                 out: oomtest[117134]:   2 mempress             started, requesting       657162240, total      1971486720
18:02:12                 out: oomtest[117138]:   3 mempress             started, requesting       657162240, total      2628648960

...

18:02:48                 out: oomtest[117170]:  21 mempress             started, requesting       657162240, total     14457569280
18:02:51                 out: oomtest[117171]:  22 mempress             started, requesting       657162240, total     15114731520
18:02:52                 out: oomtest[117174]:  23 mempress             started, requesting       657162240, total     15771893760
18:02:56                 out: Shared connection to mellanox-bluefield-01.khw4.lab.eng.bos.redhat.com closed.
18:02:56         Command event: 56.98 waiting for stream readers
18:02:56         Command event: 56.98 stdout reader done
18:02:56         Command returned '255'.

When executed through beaker, same box as above:

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::   Test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [ 12:48:00 ] :: [  BEGIN   ] :: Running 'gcc -o oomtest oomtest.c'
oomtest.c:59:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
   59 | oom_adj(int adj){
      | ^~~~~~~
:: [ 12:48:00 ] :: [   PASS   ] :: Command 'gcc -o oomtest oomtest.c' (Expected 0, got 0)
:: [ 12:48:00 ] :: [  BEGIN   ] :: Running oom triggerring program :: actually running './oomtest'
oomtest[8272]:   0 oomtest lead         started, requesting               0, total               0
oomtest[8276]:   1 fork_child           started, requesting      1314324480, total      1314324480
oomtest[9790]:   2 mempress             started, requesting       657162240, total      1971486720
oomtest[10480]:   3 mempress             started, requesting       657162240, total      2628648960

...

oomtest[17184]:  21 mempress             started, requesting       657162240, total     14457569280
oomtest[17186]:  22 mempress             started, requesting       657162240, total     15114731520
oomtest[17187]:  23 mempress             started, requesting       657162240, total     15771893760
oomtest[8272]: 8276 ended with sig 9
oomtest[8272]: killpg 8276
oomtest[8272]: oomtest PASSED
:: [ 12:48:54 ] :: [   PASS   ] :: Running oom triggerring program (Expected 0, got 0)
:: [ 12:48:55 ] :: [  BEGIN   ] :: we shold see UID in dmesg :: actually running 'dmesg | grep 'Killed process' | grep UID'
[  157.295906] Out of memory: Killed process 8276 (oomtest) total-vm:1285724kB, anon-rss:1283632kB, file-rss:1144kB, shmem-rss:0kB, UID:0 pgtables:2552kB oom_score_adj:705
:: [ 12:48:55 ] :: [   PASS   ] :: we shold see UID in dmesg (Expected 0, got 0)

Is it possible to wrap, trap, or protect the shell as beaker does to allow the test to run to the end?

sopos commented 1 year ago

Have you tried to to let tmt to run the test through choom? I mean to use test: choom -n -1000 runtest.sh in the main.fmf? I'm not sure wether tmt uses some other intermediate script so that might be needed to be set for that as well or by the tmt itself.

sbertramrh commented 1 year ago

Have you tried to to let tmt to run the test through choom? I mean to use test: choom -n -1000 runtest.sh in the main.fmf? I'm not sure wether tmt uses some other intermediate script so that might be needed to be set for that as well or by the tmt itself.

Tried it and same result.

happz commented 1 year ago

I wonder what magic Restraint does to prevent this. It should be possible to include it in tmt, sure.

lukaszachy commented 1 year ago

I could find just https://github.com/restraint-harness/restraint/blob/master/plugins/task_run.d/35_oom_adj

sopos commented 1 year ago

this [1] is probably more important but I'm not sure how to achieve that with simple ssh command execution

  1. https://github.com/restraint-harness/restraint/blob/master/init.d/restraintd.service#L12
lukaszachy commented 1 year ago

@sbertramrh Is that oomtest.c something which can be shared (to play with...)?

thrix commented 1 year ago

Let's try to add this to our guest setup playbooks, i.e. adjust OOM score for sshd running on the test machine.\

Seems to me like a reasonable request.

lukaszachy commented 1 year ago

@sbertramrh Is that oomtest.c something which can be shared (to play with...)?

Shared privately via email.

sbertramrh commented 1 year ago

@lukaszachy during recent CTC for automotive this runs as expected on the device under test and does not suddenly end the test as shown above. Will try and narrow done circumstances when this happens and add more info. Still using the same version of tmt when i found this issue, tmt version: 1.20.0 (fb222fc).

14:32:41                 out: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::^M
14:32:41                 out: ::   Setup^M
14:32:41                 out: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::^M
14:32:41                 out: ^M
14:32:41                 out: :: [ 10:32:41 ] :: [  BEGIN   ] :: Running 'dmesg -c > /dev/null'^M
14:32:41                 out: :: [ 10:32:41 ] :: [   PASS   ] :: Command 'dmesg -c > /dev/null' (Expected 0, got 0)^M
14:32:41                 out: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::^M
14:32:41                 out: ::   Duration: 0s^M
14:32:41                 out: ::   Assertions: 1 good, 0 bad^M
14:32:41                 out: ::   RESULT: PASS (Setup)^M
14:32:41                 out: ^M
14:32:41                 out: ^M
14:32:41                 out: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::^M
14:32:41                 out: ::   Test^M
14:32:41                 out: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::^M
14:32:41                 out: ^M
14:32:41                 out: :: [ 10:32:41 ] :: [  BEGIN   ] :: Running 'gcc -o oomtest oomtest.c'^M
14:32:41                 out: oomtest.c:59:1: warning: return type defaults to ‘int’ [-Wimplicit-int]^M
14:32:41                 out:    59 | oom_adj(int adj){^M
14:32:41                 out:       | ^~~~~~~^M
14:32:41                 out: :: [ 10:32:41 ] :: [   PASS   ] :: Command 'gcc -o oomtest oomtest.c' (Expected 0, got 0)^M
14:32:41                 out: :: [ 10:32:41 ] :: [  BEGIN   ] :: Running oom triggerring program :: actually running './oomtest'^M
14:32:41                 out: oomtest[11198]:   0 oomtest lead         started, requesting               0, total               0^M
14:32:41                 out: oomtest[11199]:   1 fork_child           started, requesting      1314324480, total      1314324480^M
14:32:42                 out: oomtest[11200]:   2 mempress             started, requesting       657162240, total      1971486720^M
14:32:42                 out: oomtest[11201]:   3 mempress             started, requesting       657162240, total      2628648960^M

...

14:32:50                 out: oomtest[11218]:  20 mempress             started, requesting       657162240, total     13800407040^M
14:32:51                 out: oomtest[11219]:  21 mempress             started, requesting       657162240, total     14457569280^M
14:32:51                 out: oomtest[11198]: 11219 ended with sig 9^M
14:32:51                 out: oomtest[11198]: killpg 11199^M
14:32:51                 out: oomtest[11198]: oomtest PASSED^M
14:32:52                 out: :: [ 10:32:51 ] :: [   PASS   ] :: Running oom triggerring program (Expected 0, got 0)^M
14:32:52                 out: :: [ 10:32:52 ] :: [  BEGIN   ] :: we shold see UID in dmesg :: actually running 'dmesg | grep 'Killed process' | grep UID'^M
14:32:52                 out: [  250.323878] Out of memory (oom_kill_allocating_task): Killed process 11219 (oomtest) total-vm:643968kB, anon-rss:614812kB, file-rss:1112kB, shmem-rss:0kB, UID:0 pgtables:1232kB oom_score_adj:588^M
14:32:52                 out: :: [ 10:32:52 ] :: [   PASS   ] :: we shold see UID in dmesg (Expected 0, got 0)^M