lf-edge / eden

Eden is where EVE and Adam get tried and tested:
https://projecteve.dev
Apache License 2.0
49 stars 47 forks source link

GitHub Actions: Integrate telemetry action #984

Closed uncleDecart closed 2 months ago

uncleDecart commented 2 months ago

This action will gather statistics around eden test execution in GitHub Runners, helping us identify bottlenecks

CC: @milan-zededa

Let's see what we have in Eden repo

uncleDecart commented 2 months ago

If this works well we could keep this beyond the current issue troubleshooting.

Yep, I hope it works during failure as well, they should give some outputs :D

milan-zededa commented 2 months ago

I see error Error: [Workflow Telemetry] Resource not accessible by integration, although diagrams are provided...

uncleDecart commented 2 months ago
image image

Huh, CPU load goes in the beginning high, we are not using much resources

uncleDecart commented 2 months ago

Seems to be working, adding sys calls information to get better visibility.

giggsoff commented 2 months ago

Seems to be working, adding sys calls information to get better visibility.

Can you please check Step Trace section? Looks like it contains only one (or two) steps. Are you sure that other charts contains the whole workflow?

uncleDecart commented 2 months ago

you're absolutely right @giggsoff, let's try this one :D

uncleDecart commented 2 months ago

Seems like it works time wise, still this steps are weirdly seen, could be because of reusable actions?

giggsoff commented 2 months ago

Seems like it works time wise, still this steps are weirdly seen, could be because of reusable actions?

I can see related issue here.

uncleDecart commented 2 months ago

So am I being punished for over-engineering separating things? Oh man. Well, we can create an action without this hierarchy, but still biggest question is how can we see nested virtualisation impact? It should be something like count of context change, but for that we technically need to get to host, from guest machine maybe sleep time could help?

milan-zededa commented 2 months ago

So am I being punished for ~over-engineering~ separating things? Oh man. Well, we can create an action without this hierarchy, but still biggest question is how can we see nested virtualisation impact? It should be something like count of context change, but for that we technically need to get to host, from guest machine maybe sleep time could help?

Maybe this could be relevant for us: https://scoutapm.com/blog/understanding-cpu-steal-time-when-should-you-be-worried

uncleDecart commented 2 months ago

@milan-zededa I think we should merge this and then in my separate fork I'll run this on self-hosted runner so that we can compare numbers and talk to buildjet about it