saveourtool / save-cloud

Cluster-based cloud mechanism for running SAVE framework
https://cosv.gitlink.org.cn
MIT License
38 stars 3 forks source link

Build and test action works for an hour #1863

Closed sanyavertolet closed 1 year ago

sanyavertolet commented 1 year ago

Build, test and upload code coverage action sometimes works extremely long. Might be either GitHub problem or our bug. Needs to be investigated.

screenshot

sanyavertolet commented 1 year ago

The last errors found on everlasting action: screenshot

kgevorkyan commented 1 year ago

@sanyavertolet it it will reproduced, I suggest also to attach the build report, by clicking for example the following button it contain all timelines

image

we hold build artifact only for 3 days, however, this report might me available, just need to find this action

nulls commented 1 year ago

I suspect gvisor runsc as reason of handling. This step was removed. Will continue to monitor

nulls commented 1 year ago

Got the same issue without gvisor runsc, also migrated docker tests from direct docker client to docker client in testcontainers

sanyavertolet commented 1 year ago

@nulls https://scans.gradle.com/s/ku36bjjl7wx3e/tests/:save-orchestrator-common:test/com.saveourtool.save.orchestrator.service.ContainerServiceTest/should%20create%20a%20container%20with%20save%20agent%20and%20test%20resources%20and%20start%20it()?top-execution=1

nulls commented 1 year ago

@nulls https://scans.gradle.com/s/ku36bjjl7wx3e/tests/:save-orchestrator-common:test/com.saveourtool.save.orchestrator.service.ContainerServiceTest/should%20create%20a%20container%20with%20save%20agent%20and%20test%20resources%20and%20start%20it()?top-execution=1

we use mockserver to mock a response from S3 (download save-agent on start of component). will try to migrate to testcontainers -- it should have more proper mechanism to wait till container is started

nulls commented 1 year ago

Build, test and upload code coverage

The hosted runner: GitHub Actions 7 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

More evidences that the build should be split to small pieces

sanyavertolet commented 1 year ago

More evidences that the build should be split to small pieces

1927

nulls commented 1 year ago

Will close after few weeks