Closed johnthagen closed 5 years ago
CC @retep998
I did timed sample run:
>rustup install beta
info: syncing channel updates for 'beta-x86_64-pc-windows-msvc'
info: latest update on 2018-11-09, rust version 1.31.0-beta.5 (bf00632e3 2018-11-08)
info: downloading component 'rustc'
54.1 MiB / 54.1 MiB (100 %) 6.8 MiB/s ETA: 0 s
info: downloading component 'rust-std'
47.8 MiB / 47.8 MiB (100 %) 6.8 MiB/s ETA: 0 s
info: downloading component 'cargo'
info: downloading component 'rust-docs'
9.1 MiB / 9.1 MiB (100 %) 7.4 MiB/s ETA: 0 s
info: installing component 'rustc'
info: installing component 'rust-std'
info: installing component 'cargo'
info: installing component 'rust-docs'
beta-x86_64-pc-windows-msvc installed - rustc 1.31.0-beta.5 (bf00632e3 2018-11-08)
All of the other components installed in a mater of a couple seconds, but rust-docs
took a staggering 2 minutes and 42 seconds to install! 😱
This is on a Windows 10 machine with a quad core i5 @ 4GHz and an SSD. All 4 cores are pegged up at around 60-80% for the duration of the install.
It appears that Windows Defender is definitely involved, based on a CPU usage capture during the install of rust-docs
:
Yes, Windows Defender does cause significant slowdowns - however, turning off real time protection doesn't help very much - it still takes forever. Maybe I should be doing something more than just disabling real time protection? At this point I'm seriously annoyed - multiple people have opened issues for this, and suggested changes such as making the doc optional, merging the small files in the doc into a few large files to accommodate for slow file systems, etc - nothing has really been done so far except "disable your antivirus".
multiple people have opened issues for this
@ndrewxie Could you please link those issues to this one? I opened this because I thought it hadn't been officially reported yet. Best to link any previous discussions together.
At this point I think the only fair thing to do is make rust-docs
installation slow on all other platforms too. That way the predominantly linux/mac using community of Rust will finally care about this problem and get it fixed.
At this point I'm seriously annoyed
At this point I think the only fair thing to do is make rust-doc installation slow on all other platforms too.
As a polyglot Windows and Mac user, I do feel sometimes that Windows support lags behind Linux and Mac in subtle ways (like this issue, for example), but I'd like to try to keep this issue focused on this particular issue, and the concrete ways the situation can be improved.
I'm normally a Linux user, but found myself trying to do some project work on Windows today and my god is everything slow.
Are there concrete reasons why creating lots of small files (which rustc
loves doing) is such an expensive thing on Windows? I can see Windows Defender consuming quite a lot of CPU and disk IO so I'm assuming that has something to do with it.
I'm not super familiar with how Windows does things like its filesystem and Windows Defender under the hood, but I'd be keen to help out. Besides telling all Windows users to turn off their antivirus are there any other ways to mitigate this poor performance? It sounds like we're not the only community suffering from poor filesystem performance because people have had the same issues with node_modules
in the JavaScript world.
It is also slow if you install Rust in WSL. Filesystem is just very slow on Windows. I would prefer to not install docs if it was possible.
Another data point from @chriskrycho:
As it stands, Windows builds on CI take ~5⨉ longer than Linux or macOS just to get through the setup step 3, because NTFS kind of chokes when dealing with a large number of small files
More discussion on this internals thread.
Issue previously reported: #763
Issue to make rust-docs optional: #998
@nrc Appears to be working on profiles to help address this: https://github.com/rust-lang/rustup.rs/issues/998#issuecomment-432831155
The rust-src
component is pretty slow to install on Windows as well. Not as bad as the docs, but it takes noticeably longer than any of the other components.
I've been looking at the rustup code to see how it works with files. If I understand correctly, updating a component is performed in a transaction where each original file is moved to a temp folder before the new file is written in its place.
I was wondering if it would be viable to simply rename the toolchain folder? That would significantly reduce the number of file operations to perform. However, this would only work for toolchain updates, not component installation/removal.
That said, installing a new toolchain (i.e. not updating) is slow as well, so this may be a red herring.
A related issue reported on the User's Forum and #1464:
The problem I am seeing is: at the end of the updating done by Rustup it hangs. The CPU and disk usages are both zero. I don’t know what’s happening but it’s not copying files, nor scanning them. And I’ve kept the shell open for hours with no changes
Another data point. Building a "Hello World"
executable on Travis CI takes 8x longer on Windows. The vast majority of the time is spent installing rust-docs
.
https://travis-ci.org/johnthagen/min-sized-rust/builds/465623377
What if rustup would install rust-src
by default and then generate docs on demand? I guess most people aren't using local docs anyway, and if they need them they can install them separately as needed. The main objection to not installing docs by default is that you cannot install them later if you don't have internet connection. Installing rust-src
and generating docs from source would solve the issue. rust-src
installs much faster then docs.
The sources are still 2,307 files. While not as bad as the 15,377 files of the docs, it's still a hefty chunk of time. Really we should just not extract rust-docs
until the user asks for the docs. That way people who are offline can still have the docs without remembering to install them ahead of time, without making installation and upgrading such a slow process for other users.
Really we should just not extract rust-docs until the user asks for the docs.
@retep998 This seems very reasonable, and I agree with @kryptan that most users will never use rust-docs
(unless somehow some IDE makes use of it automatically). For example, I know intellij-rust
can navigate into the std
source from user code which is pretty nice.
What would be the workflow envisioned for this? Something like the following?
$ rustup install toolchain X
... (installs rust-docs compressed file, but does not extract)
... (some time later)
$ rustup component add rust-docs
... (this extracts the rust-docs compressed file, so works even without an Internet connection)
Related: @kinnison added a progress bar for the install step in #1593
I think this is a strict duplicate of #904 - the rust-docs component shows the pathology most severely, but its not unique to rust-docs.
@rbtcollins Even if we had perfect disk access, installing rust-docs
would still take a long time. This issue is for taking measures such as not extracting the docs until needed or even not downloading them at all.
I agree with @retep998 , #904 can improve things, but with this amount of files, I don't think it's a "strict duplicate". It's related, but not duplicate.
I tested it on fast NVMe drive and disk usage does not exceed 30% but Windows Defender tops two cores.
@retep998 ok, so if the focus here is on making docs something more just-in-time or whatever, I agree; I was going from the title, which was that its slow - it needn't be more than ~60seconds from my quick experiments (without getting into esoteric NTFS behaviours or anything like that) - see https://github.com/rust-lang/rustup.rs/issues/904#issuecomment-481010675
Oh, in terms of running up a little web server with docs - offering the docs package as an .iso would be a sensible way to do it: Windows has built in iso mounting support these days, and so it should be possible to programmatically mount the iso when 'rustup doc' is called.
@rbtcollins That would make it harder to allow users to bookmark parts of the docs which they find useful, unless we had a rustup httpd service of some kind which seems somewhat problematic for us to manage.
@rbtcollins there are more sensible ways to serve files from an archive, e.g. https://github.com/killercup/static-filez/. ZIP archives would work great. But that's a different issue: some users basically have permanent Internet connections, and they might think there's no point in downloading and storing the documentation if they're never using rustup doc
.
error: component 'rust-docs' for target 'x86_64-unknown-linux-gnu' is required for toolchain 'nightly-x86_64-unknown-linux-gnu' and cannot be removed
Why‽ But if you do choose to install the docs, it would be nice if it didn't take 5 minutes.
Mounting ISOs is problematic because it might require administrative privileges.
EDIT: Sorry for the mostly off-topic post, I confused this with https://github.com/rust-lang/rustup.rs/issues/998.
@kinnison I'm sure that can be engineered past. e.g. a symlink at the current path pointing to wherever the ISO got mounted, updated when its remounted.
@lnicola I phrased myself really poorly: yes, serving actual http from a .zip is trivially easier; I was more meaning 'having a format to access locally and efficiently'. Hmm, there is an explorer view into .zip files these days, but I don't know if e.g. firefox /edge can transparently traverse into them. That might be worth examining too.
I agree that always-on-internet users may not wish to have local docs at all :). And obviously I agree that having the install be faster would be good :P. - https://github.com/rust-lang/rustup.rs/pull/1744 will help with that.
@rbtcollins Yeah, #1744 is awesome, thanks for that (and see my edit). As a further step, do you think it would be possible to extract the package on the fly while downloading it?
Even with the large amount of files, for me the package server is probably even slower than the disk access, which can be annoying.
@lnicola see #731 - I put some thoughts on that there; but a high level tl;dr - absolutely it can be done; the current code isn't particularly well structured for it, and there would be rather wider race conditions where nothing is usable (time from moving old files out of the way to moving new files into place). I suspect increasing download concurrency and optimising the IO of extraction will get us to a pretty good place without the complexity of direct net->disk streaming.
@rbtcollins Sorry, I'm probably missing something, but with #1744 I assume things work like this:
Assuming the above is correct (and it might not be), streaming the decompression would happen between the first two steps above, so it shouldn't affect in any way the existing installation and the rollback guarantees.
@lnicola ah yes, we can merge 1 and 2 in that model, if we keep a staging area.
I'm seeing the same behavior on Debian Linux boxes at my University. It's possible that they are running network antivirus there, but I'm skeptical. 1m or so to download everything else: 30m for the docs at a few KB/s. (It also took several minutes to extract everything at work, and a few seconds at home.) I don't see this from my home Linux box at all. Maybe this is not really a Windows thing?
No it is definitely a windows things where installing the docs takes forever. The download taking a long time is a completely different unrelated issue.
Maybe we can keep rust-doc in zip format in installing. This will require IDE/doc viewer to support zip format.
Rather than fixing this, Windows should definitely realize that slowing their system down like this isnt acceptable.
It's an issue in Windows, not rustup.
@leo-lb We're significantly more likely to be able to use the OS in a way that the OS author intended, than to change the OS author's mind on their design tradeoffs.
@leo-lb There are were many issues that lie definitively within rustup's remit, including the poor syscall behaviour noted at the start of the bug (which is largely [directories are a remaining exception for now] mitigated by the patches I have written, some of which are already merged. Those issues affect[ed] Linux too, but will show up much less often (putting ~ on a network share would be one likely way of demonstrating them in wall-clock times).
The only really Windows specific issue thus far is the poor performance of CloseHandle, which if we can find an alternative mitigation for - great; but the underlying behaviour here, of CloseHandle blocking until outstanding IO requests associated with the process have had their kernel resources released, is a very deep design characteristic of Windows... If we want rustup to be fast this year, we need to work with the kernel that is in use by our users, not with a kernel that Microsoft may or may not alter based on our pointing out that this has a performance impact.
I say that this is really the only Windows specific issue, because there are virus scanners for Linux (albeit few and far between in terms of deployed base), and there are content indexing programs that will index user content and watch the file system to dynamically index - so those impacts should be shared.
It's also worth noting, as I do in the commit message, that third parties can cause arbitrary delays in CloseHandle, which neither Microsoft nor rustup can control, and we mitigate that as well with this patch. https://randomascii.wordpress.com/2018/04/17/making-windows-slower-part-1-file-access/ is a good blog post about some of the ways this can happen. Similar things can happen to Linux as well (e.g. with ebpf hooked code).
Windows Defender who is the principal contributor to this issue, is spinning up whole arbitrary file type parsers between system calls, there's nothing to do about it but to change Windows Defender and thus Windows, who has it enabled by default. I understand the problem of using a monolithic system where any program gets access to the whole filesystem without an Anti-Virus, but it's not like we can't do better than monolithic these days.
As a workaround, I would be looking at trying to extract the file in a way that Windows Defender can clearly recognize the relationship between it's archived copy and it's extracted copy, Windows Defender will scan archives when they are written already, if it can identify that the file is being copied out of an archive it already scanned, it does not need to scan it again.
You might want to use a third party tool for this, a third party tool that is trusted by Windows Defender, that most commonly happens when it has a digital signature with some history of having good behavior. Windows might have some command line tools to extract archives.
To clarify, as rustup executables lack digital signatures, they're considered unknown and potentially suspicious by Windows Defender or other Anti-Viruses and stricter active scanning rules are being applied on them. A file without a digital signature could end up being considered trusted, but it takes more time, an employee needs to go and attest that the executable is trusted and register it's hash to the database. With digital signatures, the process is more of an habit. Companies get reputation through their digital signature, if they misbehave, the whole signature is marked as suspicious. Digital signatures come in limited supply. Malware is less likely to have access to digital signatures, newly created ones don't get trust so early, some new software that has no malicious behavior must first appear for the Anti-Virus to start trusting it. Then digital signatures that get stolen, it's another story, it can cause quite an amount of damage because Anti-Virus will turn blind eye for some time, but eventually companies learn to protect their systems to avoid that and the event becomes rarer.
@leo-lb you may be interested to know that the performance work I'm doing is improving performance when Defender is disabled / excluded on the rustup directory: When I started doing this optimisation work with defender disabled and indexing disabled, rust-docs installation was nearly 40 seconds: I've pulled it down to 10 seconds.
Yes Defender's overhead is a significant factor as well: There is a work item in the rustup community to have rustup's distribution be signed, but I haven't been involved in that, and even with that made completely optimal there was a lot of poor behaviour we had to fix - see above.
@rbtcollins Okay, well good work on that. Rather than signing rustup binaries themselves, you could ship rust docs in .zip format and make use of a command line tool shipped by default in Windows to extract the archive.
Rather than ZIP, I would use CAB files, because there is better supported tools installed by default for it since early Windows days. See https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-R2-and-2012/hh875545(v=ws.11) and https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/expand
Also, the standard installation mechanism on Windows is MSI, so that'd be the best. Packaging either Rust as a whole in an MSI, or only for docs, and then install that MSI.
You have little hope at staying in the FOSS world for your toolchains and build systems if you are using Windows, else you'd be suffering quite big constraints like this performance one. Stuff to make cab or msi files arent FOSS. Code signing certificates are the same thing, they're an administrative cost.
https://en.wikipedia.org/wiki/WiX is the only FOSS that can create MSI Windows Installers, CAB file can be read but not created by FOSS.
I'm now involved in a discussion with Defender folk ; they've asked for traces of the poor behaviour. Please use rustup 1.18.3 or newer, nothing older, as there is no point sending in unoptimised traces IMO.
We only need 3-4 traces from a few different machines where Defender overhead is the problem; I'll be submitting one from a surface pro and one from a 2990WX. If someone has a machine with e.g. spinning metal disks, or even less cores or whatever - something interestingly different, please add the trace id here.
For us the use case we're tracing is unpack performance, so I'd suggest running rustup uninstall nightly, rustup install nightly waiting for it to start installation, then copy the contents of ~/.rustup/downloads/* to a temp dir. The rustup uninstall nightly. Then copy that temp dir contents back to the the downloads dir.
Finally, start the trace and then run rustup install nightly.
Instructions for gathering a trace (note this captures most metadata about what the system is doing..):
@rbtcollins feedback-hub:?contextid=242&feedbackid=953429f7-8755-4abb-ae7b-26cb61729786
I have my .rustup
, .cargo
, and %TEMP%
directories are on a
Seagate Samsung Spinpoint M8 ST1000LM024 (HN-M101MBB/EX2) 1TB 5400 RPM 8MB Cache SATA 6.0Gb/s 2.5" Internal Notebook Hard Drive
non-OS HDD drive (D:\
; C:\
is a SanDisk SDSSDH3 500G). Both 'rustc' and 'rust-std' components claimed a somewhat constant install rate of 11.7 MiB/s; 'rust-docs' varied between as low as 15 KiB/s and as high as 300 KiB/s.
This is my trace from my 2990WX feedback-hub:?contextid=242&feedbackid=b17bbe6c-dca5-4589-8f2b-367aa517fbad
@CAD97 do you perhaps have the console output? e.g. something like
Closing 15213 deferred file handles 5.3 Kihandles / 14.9 Kihandles ( 36 %) 346 handles/s in 19s ETA: 28s
If you didn't see the above, you were running an older, single-threaded rustup
As reported on the User's Forum, installing the
rust-docs
component on Windows 10 is currently very slow compared to other components, even on machines with an SSD and multi-core processor.