thesofproject / linux

Linux kernel source tree
Other
89 stars 129 forks source link

[FEATURE] Ability to start DMA-trace while DSP is running #3275

Open kv2019i opened 2 years ago

kv2019i commented 2 years ago

As documented in https://github.com/thesofproject/sof-docs/pull/381 and discussed at length in https://github.com/thesofproject/sof-test/pull/802#issuecomment-965700731 , current DMA trace is not designed to support the scenario where DMA tracing is started by user-space (sof-logger) while the DSP is in fact running. The current kernel driver allows to do this, but logs are provided in best-effort manner.

For development purposes, getting some traces is better than not getting any traces. When used for validation, lack of determinism is a problem. A specific problem for validation is how to capture early traces reliable. The sof-logger cannot be started when SOF driver is not loaded. But when SOF driver is loaded, then the DSP might be already be powered on. This leaves a race that cannot be easily handled in user-space.

Improve the kernel tracing interface to guarantee:

FYI @marc-hb @ujfalusi @plbossart @lgirdwood

marc-hb commented 2 years ago

The sof-logger cannot be started when SOF driver is not loaded. But when SOF driver is loaded, then the DSP might be already be powered on. This leaves a race that cannot be easily handled in user-space.

In practice this race is a problem only when the FW logs are so verbose that they fill the entire ring buffer. It's never been a problem for me.

driver needs to hide runtime-suspend transitions of DSP

Which current problem does this solve?

  • if the ringbuffer has wrapped while sof-logger was not running, the returned traces must still be in order

Yes but not urgent, sof-logger already has mitigation for that: it detects and skips garbage and is smart enough to notice timestamps going backwards and report them.

kv2019i commented 2 years ago

@marc-hb wrote:

The sof-logger cannot be started when SOF driver is not loaded. But when SOF driver is loaded, then the DSP might be already be powered on. This leaves a race that cannot be easily handled in user-space.

In practice this race is a problem only when the FW logs are so verbose that they fill the entire ring buffer. It's never been a problem for me.

driver needs to hide runtime-suspend transitions of DSP

Which current problem does this solve?

Let's discuss that. It seems based on discussion on https://github.com/thesofproject/linux/pull/3136 that currently sof-test cases are assuming some data is always in the trace buffer. This was tracked also in https://github.com/thesofproject/sof-test/issues/297

If we'd move to a "double ringbuffer" in the SOF driver, "sof-logger -t" would always return at least some traces and this would be guaranteed if the firmware boot was successful. IOW, if no traces are read, this is a guaranteed sign of failure and can be raised.

This mostly impacts cases where DSP is runtime suspended and resumed during (or between) test cases, and the test cases are restarting sof-logger. With current semantics, if the DSP has suspended/resumed with sof-logger not running continuously, some traces are missed, and it is possible trace log is empty when sof-logger is started.

I don't fully understand the problems involved, but I can see with a double-buffer kernel trace, test case development would be easier.

One non-kernel solution is to improve sof-logger with the ability to keep it running even when driver is not loaded. It could keep polling the availability of kernel trace node and start emitting trace whenever the trace node becomes available. This would allow to get the complete FW trace for a full test plan, including driver load/unload cases. Currently sof-logger must be restarted as the driver must be loaded, but this can be changed.

marc-hb commented 2 years ago

It seems based on discussion on #3136 that currently sof-test cases are assuming some data is always in the trace buffer.

Yes, and for various good and bad reasons this kept passing until #3136.

If we'd move to a "double ringbuffer" in the SOF driver, "sof-logger -t" would always return at least some traces and this would be guaranteed if the firmware boot was successful.

From a validation perspective the most important thing is as you wrote: determinism. You obviously can't assert something non-deterministic. So in any simple situation the logs should be either always empty or always present.

Determinism is made more complicated by power management that powers off the DSP (D3) after a timeout. Tests have little control on this timeout. Also, it's possible (and even recommended by the logger docs) to disable D3 and it would simpler from a test perspective if logging behavior is not totally different when disabling D3 (of course content differences are unavoidable).

So given a choice, I think I prefer logs to be never empty as long as the DSP has booted at least once in the past. Logs obviously provide more information than no logs and this matches other tools like dmesg and journalctl -e or -f.

If we'd move to a "double ringbuffer" in the SOF driver, "sof-logger -t" would always return at least some traces and this would be guaranteed if the firmware boot was successful. IOW, if no traces are read, this is a guaranteed sign of failure and can be raised.

Sounds good, thanks.

One non-kernel solution is to improve sof-logger with the ability to keep it running even when driver is not loaded. It could keep polling the availability of kernel trace node and start emitting trace whenever the trace node becomes available.

Good idea, I'll try to find some time and do that. I'm already familiar with that code. Is there some inotify-like API you would recommend to avoid polling /sys? EDIT: submitted in https://github.com/thesofproject/sof/pull/5027

marc-hb commented 2 years ago

Rescuing from a private email thread a non-intuitive (to me) host_offset behavior explained by @ujfalusi

I stopped disabling D3 (disabling D3 is what sof-docs recommend when using the logger, don't forget that...) and I can reproduce this every single time:

sof-logger -t

# in a different terminal: speakertest -l 1 && sleep 1 && sudo pkill sof-logger

<wait at least 2s for D3>

sof-logger -t # full logs from DSP boot without waking up the DSP,

COMPLETE duplicate of what the first sof-logger already showed

Ctrl-C sof-logger -t # empty

As you can see no filter updated involved. I guess this could be how we always got at least the logs from the previous test so far. Even before/without the filter updates.

Now replace "sleep 1" by "sleep 3" and all sof-logger (except the very first one) return empty.

Which is pretty much expected based on the kernel code:

-- sleep 1 --

-- sleep 3 --

marc-hb commented 2 years ago

Another useful explanation from @ujfalusi