Open david-laight opened 4 years ago
That analysis isn't quite right. I can't work out what time offset it is subtracting from each trace file - the epoch doesn't seem to line up at all. Looking further it seems to normalize the timestamp to the first 'unclipped' event. That is just bollocks. If one of the raw event headers has 'clipped' set then all the earlier events should be deleted. (The kernel seems to return a 'junk' page at the start of the raw data.) If I delete the 'clipped' flag I get closer to loading a valid file (all the unwanted trace is pad).
Something very strange is going on - see screenshot. The data for cpu:2 at time 0 should be (almost) 300ms later but the events have been given timestamps relative to the start of the raw data page - ignoring the intervals of all the 'pad' entries between the start of the raw page and the first actual trace. cpu 1 and 3 have similarly structured files. cpu 0 data starts at the beginning of the 2nd page. (The first page of all 4 traces is all 'pad' entries because the 'clipped' flag is set at the beginning of the 2nd page.)
Trace file for above:
Hi, David,
I'll try to address your questions and discussion, but tl;dr: here https://github.com/google/schedviz/blob/master/server/fs_storage.go#L110, change the NewCollection call to:
collection, err := sched.NewCollection(
collectionProto.EventSet,
sched.UsingEventLoaders(sched.FaultTolerantEventLoaders()), sched.NormalizeTimestamps(true))
(that is, adding the sched.UsingEventLoaders(sched.FaultTolerantEventLoaders()) collection option) and your collection will load.
Now, to your questions and comments, and some background:
It looks like your per-CPU buffers are really small (20KiB) in your trace, which means that buffer overflow is very likely, and the lack of clipping problematic. However, the good news is that the fault-tolerant loader is generally able to work around the lack of clipping, hence you can use it to load your traces. You might also try increasing the per-CPU buffer size (or decreasing trace duration) until overflow is uncommon. Internally, we mostly use 8MiB per-CPU buffers, and very seldom overflow even on 2s collections; the time-to-overflow of course depends on the system load and the amount of scheduling thrash, but traces under 10s very rarely overflow at all.
Lee
On Wed, Jan 8, 2020 at 4:41 AM david-laight notifications@github.com wrote:
I've been getting 'inference error (CPU)' trying to load some traces. Having written a program to read and dump (and adjust) the raw trace data I think I know what is happening.
A process (pid 3857) in the trace (I'll try to attach) has run a few times on cpu2. At (raw) time 39866b80c873 it is migrated to cpu1 (stolen by) and runs. At 39866c19166e cpu1 migrates it back to cpu2 and then enables it. At 39866c1a093f it executes on cpu2. These are the only 'migrates' for pid 3857 in the trace, so must be relevant.
The file upload fails because the process state is 'unknown' on cpu2 @179948522 having gone 'sleeping' on cp1 @179792871.
Digging through the *.go files I think I understand the problem. The large absolute timestamps are converted to small values in setTimestampNormalize(). This code subracts off the timestamp of the first real trace entry from all the others. However it is only (AFAICT) looking at the data from a single cpu. So it subtracts different amounts off the timestamps from the different files. This slight difference doesn't usually make much difference, but when a migrate is done by the cpu the process last ran on immediiadtely prior to being woken; the process schedule trace can end up with a timestamp earlier than the migrate.
Note that the trace files contain a lot of 'pad' entries where they've been 'adjusted' so that they all start at the same time and only contain valid trace. (Oh can you also fix ringBufferEvent() so it doesn't reject 'Padding' with time interval zero - these happen if a pad is added after a 'time extend' entry.)
trace.tar.gz https://github.com/google/schedviz/files/4035275/trace.tar.gz
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/schedviz/issues/14?email_source=notifications&email_token=AA27XBYNGIGQQLDCWICYNNTQ4XCYBA5CNFSM4KEHX7V2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IEYCBJA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA27XB4E56FZCSCKKDVH5ADQ4XCYBANCNFSM4KEHX7VQ .
I need to load traces that overflow. I intend running tests that take hours and trigger the trace to stop when there are excessive schedling delays. Since the test is processing real-time audio 100ms of trace is a long time! I will use a slightly longer trace, but I'm testing the loading of wrapped traces.
When I've dumped the raw trace files with my own decoder, I don't see any missing events or wakeups mis-sequenced against migrates.
If I (well my program!) look through all the trace files and find the timestamp of the first trace after the wrap (typically the first trace in the second 4k page), and take the latest of those timestamps across all the cpus. I then want to discard all the entries before that timestamp - since the trace after that point is complete. If I just change all the entries themselves to 'pad' ones the trace still fails to load. Zeroing the header flags byte doesn't help. Changing the page header timestamp to a few ns before trace timestamp and using a single pad to delete all the unwanted traces gives me a trace file that loads.
So I'm fairly certain that something is wrong with the logic that timestamps the trace entries and then merges them into a single trace.
I can probably debug it, but the instructions I have for building and running it aren't very informative about what they are actually doing. What do I have to rerun to rebuild the code that is run when the browser uploads a file? I'm guessing I don't need to restart the webserver after such a change.
Did you try the code change I suggest at the start of my response? That works for me.
On Thu, Jan 9, 2020, 05:55 david-laight notifications@github.com wrote:
I need to load traces that overflow. I intend running tests that take hours and trigger the trace to stop when there are excessive schedling delays. Since the test is processing real-time audio 100ms of trace is a long time! I will use a slightly longer trace, but I'm testing the loading of wrapped traces.
When I've dumped the raw trace files with my own decoder, I don't see any missing events or wakeups mis-sequenced against migrates.
If I (well my program!) look through all the trace files and find the timestamp of the first trace after the wrap (typically the first trace in the second 4k page), and take the latest of those timestamps across all the cpus. I then want to discard all the entries before that timestamp - since the trace after that point is complete. If I just change all the entries themselves to 'pad' ones the trace still fails to load. Zeroing the header flags byte doesn't help. Changing the page header timestamp to a few ns before trace timestamp and using a single pad to delete all the unwanted traces gives me a trace file that loads.
So I'm fairly certain that something is wrong with the logic that timestamps the trace entries and then merges them into a single trace.
I can probably debug it, but the instructions I have for building and running it aren't very informative about what they are actually doing. What do I have to rerun to rebuild the code that is run when the browser uploads a file? I'm guessing I don't need to restart the webserver after such a change.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/schedviz/issues/14?email_source=notifications&email_token=AA27XBZWXUOJGUVATDLN7YTQ44UDPA5CNFSM4KEHX7V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIQMABI#issuecomment-572571653, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA27XB2ZTMCTDNSVCWL7BCTQ44UDPANCNFSM4KEHX7VQ .
Is the foo.go code interpreted? I've never looked at any before.
If that change just allows the process state transition through it is just papering over the cracks in the code that parses the traces. There are no missing (or mis-sequenced) traces in the files I'm using when I decode tham with a little program I wrote.
It is compiled. You'll need to rebuild and rerun.
I don't believe it is 'papering over the cracks'. I believe its results are correct, and that it's dropping only those events that don't work with the preponderance of evidence from all events.
If you'd like to use your own trace interpreter to generate processed traces to view in SV, this is quite possible; look at how we load traces from ebpf for an example.
On Thu, Jan 9, 2020, 06:44 david-laight notifications@github.com wrote:
Is the foo.go code interpreted? I've never looked at any before.
If that change just allows the process state transition through it is just papering over the cracks in the code that parses the traces. There are no missing (or mis-sequenced) traces in the files I'm using when I decode tham with a little program I wrote.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/schedviz/issues/14?email_source=notifications&email_token=AA27XB7ZQE35JPDO5UCGZ2TQ44Z5VA5CNFSM4KEHX7V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIQRBTI#issuecomment-572592333, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA27XB6337SBLWGJ6OUI3CTQ44Z5VANCNFSM4KEHX7VQ .
I've managed to add some trace and found one of the problems. When I try to delete unwanted events by converting them to 'pad' the time deltas aren't added in. The library code in linux/tools/lib/traceevent/kbuffer-parse.c (which is what I based my code on) assumes that the high 27 bits are a time delta for all trace types. Whereas the schedviz code has some strange comments about the time delta od pads being zero/non-zero means something, it also fails to add in the time delta for pad entries. Always adding in the time delta would simplify the code - especially since I'm 99.999% sure type 31 entries haven't been used for a long, long, long time.
Anyway I've now got a program that deletes the early events from traces that have wrapped.
An extra question, If I trace an extra event (and include its format in the tar file) will I be able to enable it as a marker on the 'events' page?
You should be able to include non-sched events and visualize them in the viewer.
On Thu, Jan 9, 2020 at 9:16 AM david-laight notifications@github.com wrote:
I've managed to add some trace and found one of the problems. When I try to delete unwanted events by converting them to 'pad' the time deltas aren't added in. The library code in linux/tools/lib/traceevent/kbuffer-parse.c (which is what I based my code on) assumes that the high 27 bits are a time delta for all trace types. Whereas the schedviz code has some strange comments about the time delta od pads being zero/non-zero means something, it also fails to add in the time delta for pad entries. Always adding in the time delta would simplify the code - especially since I'm 99.999% sure type 31 entries haven't been used for a long, long, long time.
Anyway I've now got a program that deletes the early events from traces that have wrapped.
An extra question, If I trace an extra event (and include its format in the tar file) will I be able to enable it as a marker on the 'events' page?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/schedviz/issues/14?email_source=notifications&email_token=AA27XB5UWQ4R5NGTLUHS543Q45LWZA5CNFSM4KEHX7V2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIRCGPI#issuecomment-572662589, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA27XB5SJO5JU4AUIZ6J5BTQ45LWZANCNFSM4KEHX7VQ .
The "strange comment" I think you are referring to comes from the kernel: https://github.com/torvalds/linux/blob/b07f636fca1c8fbba124b0082487c0b3890a0e0c/include/linux/ring_buffer.h#L24-L31
The time_delta doesn't appear to be used for padding events when iterating over events in the kernel (this is the code we looked at when writing the SchedViz parser): https://github.com/torvalds/linux/blob/b07f636fca1c8fbba124b0082487c0b3890a0e0c/kernel/trace/ring_buffer.c#L3641-L3649
I was looking at this https://github.com/torvalds/linux/blob/master/tools/lib/traceevent/kbuffer-parse.c#L345 which is intended to be library code that applications can use to read the raw trace.
It does't bother to parse absolute time stamps - I suspect they are never used (it skips the wrong length!).
The kernel probably never writes 'pad' entries (except at the end of trace blocks) so knows the time update is irrelevant.
If 'pad' can be used for deleted items then the timestamp probably needs to be processed.
Just FYI attached is a screenshot of a 'kworker/u8:3' processes running without allowing pre-emption for 3.1ms. Unfortunately the name is badly truncated (actually it isn't, but it isn't useful) I think it is scheduled from 'Xorg'.
Since I'm running with threads pinned to cpus (maybe not such a good idea unless they are reserved at boot time) everything is delayed.
I've been getting 'inference error (CPU)' trying to load some traces. Having written a program to read and dump (and adjust) the raw trace data I think I know what is happening.
A process (pid 3857) in the trace (I'll try to attach) has run a few times on cpu2. At (raw) time 39866b80c873 it is migrated to cpu1 (stolen by) and runs. At 39866c19166e cpu1 migrates it back to cpu2 and then enables it. At 39866c1a093f it executes on cpu2. These are the only 'migrates' for pid 3857 in the trace, so must be relevant.
The file upload fails because the process state is 'unknown' on cpu2 @179948522 having gone 'sleeping' on cp1 @179792871.
Digging through the *.go files I think I understand the problem. The large absolute timestamps are converted to small values in setTimestampNormalize(). This code subracts off the timestamp of the first real trace entry from all the others. However it is only (AFAICT) looking at the data from a single cpu. So it subtracts different amounts off the timestamps from the different files. This slight difference doesn't usually make much difference, but when a migrate is done by the cpu the process last ran on immediiadtely prior to being woken; the process schedule trace can end up with a timestamp earlier than the migrate.
Note that the trace files contain a lot of 'pad' entries where they've been 'adjusted' so that they all start at the same time and only contain valid trace. (Oh can you also fix ringBufferEvent() so it doesn't reject 'Padding' with time interval zero - these happen if a pad is added after a 'time extend' entry.)
trace.tar.gz