about TBROK causing test failure

linux-test-project / ltp

Linux Test Project (mailing list: https://lists.linux.it/listinfo/ltp)

https://linux-test-project.readthedocs.io/

GNU General Public License v2.0

2.28k stars 999 forks source link

about TBROK causing test failure #974

Open qqxlt opened 1 year ago

qqxlt commented 1 year ago

Some test cases fail. It can be seen from the test process that it is caused by TBROK. Repeated testing of the test case has a probability of passing. Can this count as failure of the test case?

for example memcg_failcnt_test1

metan-ucw commented 1 year ago

I/O errors from memcg files looks like a something is wrong with your kernel.

Also TBROK means that test setup has failed and something is broken, it's not the cause but rather end result of broken system.

qqxlt commented 1 year ago

Repeated testing of the test case has a probability of passing.

Repeated testing of the test case has a probability of passing. Do you still need to pay attention to such mistakes?

metan-ucw commented 1 year ago

Of course, if test fails only once in ten or even in hundred runs there is likely problem somewhere. Linux kernel is massively parallel and in most of the cases you need right timing for a race condition to happen. Sometimes the probability of hitting the bug is really small, but that does not mean that it's not there.

qqxlt commented 1 year ago

Of course, if test fails only once in ten or even in hundred runs there is likely problem somewhere. Linux kernel is massively parallel and in most of the cases you need right timing for a race condition to happen. Sometimes the probability of hitting the bug is really small, but that does not mean that it's not there.

Thanks

qqxlt commented 1 year ago

Of course, if test fails only once in ten or even in hundred runs there is likely problem somewhere. Linux kernel is massively parallel and in most of the cases you need right timing for a race condition to happen. Sometimes the probability of hitting the bug is really small, but that does not mean that it's not there.

Thanks

Is there any good debugging method for such problems?

metan-ucw commented 1 year ago

I'm afraid there isn't easy manual to follow. Linux kernel is a complex beast and it takes time to understand its internals. I would have started with isolating the exact failure i.e. figuring out what step went wrong. Once you have idea what exactly went wrong you can start with debugging. However if this is a kernel race condition it takes years to acquire right skill set to be able to pinpoint the problem.

qqxlt commented 1 year ago

I'm afraid there isn't easy manual to follow. Linux kernel is a complex beast and it takes time to understand its internals. I would have started with isolating the exact failure i.e. figuring out what step went wrong. Once you have idea what exactly went wrong you can start with debugging. However if this is a kernel race condition it takes years to acquire right skill set to be able to pinpoint the problem.

Thanks