rust-lang / rustup

The Rust toolchain installer
https://rust-lang.github.io/rustup/
Apache License 2.0
6.17k stars 888 forks source link

sporadic CI failures deleting files on windows #1900

Open rbtcollins opened 5 years ago

rbtcollins commented 5 years ago

The Windows CI jobs fail from time to time deleting files during self upgrade; its likely a genuine bug, and the normal first step would be a procmon or WPA trace to identify what process is holding a handle open on the file - so we can see whether we are stepping on our own feet, or whether it is a virus scanner race we need more retries for.

High level we need to:

tesuji commented 5 years ago

Here is the log file created by procmon while running cli-v1 test:

cargo test --release --test cli-v1 -- remove_toolchain_then_add_again run_command rustc_no_default_toolchain update_channel update_on_channel_when_date_has_changed

image

Logfile.PML.zip

kinnison commented 5 years ago

Oooh well done @lzutao -- now we need to see if the log gives us what we need to diagnose. Is there a procmon reader for Linux?

tesuji commented 5 years ago

procmon could generated XML and CSV format if you prefer.

rbtcollins commented 5 years ago

image

rbtcollins commented 5 years ago

Thats STATUS_TOO_MANY_LINKS https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/69643dd3-b518-465d-bb0e-e2e9c5b7875e indeed.

rbtcollins commented 5 years ago

I think I might know whats going on: we're linking the rustup.exe from target into the test case. Presumably we're linking all the aliases too - cargo, rustc etc.

NTFS has a link limit per file of 1K.

So a mere 100 tests making 10 links each would be able to exceed this limit. And and any laziness in accounting (or in test working dir cleanup equally) would lead to issues.

My recommendation: change the test suite to take a copy of the the rustup.exe we're testing with for each test, so that all the links being made are siloed within each test, then we are going to be under that threshold.

rbtcollins commented 5 years ago

https://en.wikipedia.org/wiki/NTFS#Hard_links

tesuji commented 5 years ago

Also see #995

rbtcollins commented 4 years ago

I think we can probably close this though we might want to put a comment in the code base about why it is a copy vs a hard link of the binary.

workingjubilee commented 3 years ago

@rustbot label: +O-windows