rust-lang / crater

Run experiments across parts of the Rust ecosystem!
https://crater.rust-lang.org
627 stars 88 forks source link

Adjust handling of "no space left on device" error #715

Closed Skgland closed 2 months ago

Skgland commented 8 months ago
  1. prefere NoSpace reason over DependsOn
    • the later is nosiy and hides a simple problem in a long log
  2. Change NoSpace from spuriouse to non-spuriouse
    • to keep/move these crates into the retry-regressed-list.txt as the DependsOn failiures, that were really NoSpace failures, appear to be frequently retried

Notes rearding 1. this somewhat mudles the hirarchy of failure reasons, but I didn't think NoSpace should be necessarily be above CompilerError(_) and CompilerDiagnosticChange

Regarding 2. while technically spuriouse I think it is reasonable to treat NoSpace as a non-spuriouse regression for practical reasonse.

This somewhat undose my fix for #700 from #713, but at least it should now be its own group/category.

Skgland commented 8 months ago

An examplary crater run would be https://crater-reports.s3.amazonaws.com/pr-118120/index.html. A lot of regressed: dependencies are due to no space left on device and were retried. While 12 no space left on device regressions hiding under spuriouse-regressions were not.

Skgland commented 8 months ago

rust-lang/rust#116088 looks similar https://crater-reports.s3.amazonaws.com/pr-116088/index.html, but a lot larger. With 123 no space left on device under spurious-regressions and a lot of regressed: dependencies hiding no space left on device regressions. After checking the first 50 entries and then random entries every touchpad scroll distance all of the checked regressed: dependencies regressions are no space left on device related

Skgland commented 8 months ago

https://github.com/rust-lang/rust/pull/116088#issuecomment-1854674193

Let's sieve those out...

made me think that the reason for the retries might be to get the no space left on device errors out of the regressed list to see the actual regression, rather than re-running the spurious failiurs to possibly find hidden regressions?

In that case the first commit should be sufficent as, they would be no space left on device would be hidden under spurious-regressions, and not cluttering up the non-spurious regresssions.

I think my preference would be to re-run (keeping all commits here), to ensure no regressions are over shadowed by no space left on device, especially as re-running appears to rather consitenly resolve them.

Skgland commented 7 months ago

@Mark-Simulacrum you appear the most active on this repo, any feedback regarding this?

  1. I think prefereing NoSpace over DependsOn would be good as otherwise the fix for #700 in #713 is bearly effective, as most no space left on device errors appear to appear while compiling dependencies.

  2. Given the frequency with which they occure I am uncertain whether hiding these errors between the spuriouse regressions is ideal. Currently the NoSpace errors that are classified as DependsOn get retried and usually succeed on the second iteration, giving a second chance to detect regressions. That would be lost if they are moved to spurious regresssions as they will be no longer visible in the regressed list and also will be absent from the retry-regressed-list.txt

Skgland commented 2 months ago

I will open a new PR rather than rewriting this from scratch