nutanix / libvfio-user

framework for emulating devices in userspace
BSD 3-Clause "New" or "Revised" License
164 stars 51 forks source link

arch linux CI workflow fails #770

Open tmakatos opened 1 year ago

tmakatos commented 1 year ago

https://github.com/nutanix/libvfio-user/actions/runs/6011550391/job/16305137885?pr=769

pacman -Sy --noconfirm \

...

:: Proceed with installation? [Y/n] 
:: Retrieving packages...
Error: The operation was canceled.

@gierens any ideas?

jlevon commented 1 year ago

It looks like it timed out.

gierens commented 1 year ago

pacman has a download timeout of 10s if I'm not mistaken that can be disabled by:

SmartSelect_20230831_064604_Samsung Internet.jpg

Not sure if you can also simply increase this, but one could simply use the timeout command on the pacman command in combination with this flag. I can have a look at this later the day.

jlevon commented 1 year ago

It's the entire workflow that's timing out though, I think, if you check the execution times. It shouldn't really take 10 minutes to update (!)

gierens commented 1 year ago

Ah yeah, thank's for the hint, now I see it too in the workflow run. Hm, I'll sync my fork and try to reproduce this on my runner.

gierens commented 1 year ago

I looked into this but I'm not able to reproduce it: https://github.com/gierens/libvfio-user/actions/runs/6066677383 ... ran it multiple times and the arch job always succeeds as the fastest.

What I did notice however is that the centos job sometimes comes pretty close to the 10 minute mark (https://github.com/gierens/libvfio-user/actions/runs/6066677383/attempts/1): Screenshot from 2023-09-04 10-13-34

And also failed one time (https://github.com/gierens/libvfio-user/actions/runs/6066677383/attempts/3): Screenshot from 2023-09-04 10-13-39 Apparently due to some connection issue: Screenshot from 2023-09-04 10-13-54

So my best guess is, that something similar happened to the arch job on the mentioned PR ... some connection issue to the arch repos and then a timeout. Nothing to worry too much about.

I'm also not convinced that the timeout came from Github since the default Github Actions timeout is 6 hours unless configured otherwise: https://nesin.io/blog/github-action-timeout ... and I cannot find any such config in the workflows.

jlevon commented 1 year ago

See the yaml in .github - we set timeout to 10 minutes as that really should be long enough for pre-commit CI.

gierens commented 1 year ago

Ah, yeah you're right ... my editor plugin for searching doesn't seem to go into hidden directories that's why I could find it, my bad!

gierens commented 1 year ago

But then that's actually settled I'd say, just a timeout due to a connection issue.

jlevon commented 1 year ago

we saw this a few times but will keep open and monitor...

gierens commented 1 year ago

I wonder if maybe the arch repo mirror that is chosen in the jobs on your runner is flaky.

jlevon commented 1 year ago

Well, it's github's runner so not something we'd control