racket / drracket

DrRacket, IDE for Racket
http://www.racket-lang.org/
Other
454 stars 93 forks source link

BSL on Windows #281

Open soegaard opened 5 years ago

soegaard commented 5 years ago

Using BSL on a Windows Machine is slow. Two users in this thread: https://www.reddit.com/r/Racket/comments/c1ifm8/drracket_performance_makes_it_unusable_when_in/erln4do/

And one this thread: https://www.reddit.com/r/Racket/comments/7wpdi9/help_drracket_running_insanely_slow_on_windows_10/

alex-hhh commented 5 years ago

Just to save someone reading the entire Reddit thread...

Environment

The Problem

On a Windows 10 machine, while using DrRacket 7.3, running an empty "#lang racket" program by pressing Ctrl-R takes less than 1 second to get to the prompt. Running an empty Beginner Student program takes about 3-4 seconds before DrRacket shows the prompt.

Investigation

I used procmon from the sysinternals suite and found that in the "#lang racket" case, there is no disk activity at all when the empty program runs, while in the BSL case, DrRacket looks up files on disk and this is aggravated because it seems that racket tries to probe a lot of non-existent paths even after the BSL files have been located on disk. I put the log for running an empty BSL program in a spreadsheet, which you can find below. If you filter it by Result "NAME_NOT_FOUND" and "PATH_NOT_FOUND" you'll see all the paths that are checked by DrRacket.

https://docs.google.com/spreadsheets/d/1snGEBOHgIUv5V3cGA3lvfT1sZHbf2b-OOrJDVCjWWw8/edit?usp=sharing

Possible Problem

Looking at the paths that are searched, it seems that Racket constructs some paths than searches from them in the entire pkgs collection, here is a snipped from the log, where it seems that the "deinprogram" module is searched in places like pict-snip, plot and r5rs and racket-doc, plus many other places not shown (and the same thing happens with other packages).

C:\Program Files\Racket\share\pkgs\pict\deinprogramm
C:\Program Files\Racket\share\pkgs\pict-snip\deinprogramm
C:\Program Files\Racket\share\pkgs\picturing-programs\deinprogramm
C:\Program Files\Racket\share\pkgs\plot\deinprogramm
C:\Program Files\Racket\share\pkgs\r5rs\deinprogramm
C:\Program Files\Racket\share\pkgs\r6rs\deinprogramm
C:\Program Files\Racket\share\pkgs\racket-doc\deinprogramm

Such a search is very inefficient on Windows -- I suspect the same thing happens on Linux, but checking for non existent files might be faster there.

mflatt commented 5 years ago

@alex-hhh Thanks for the analysis!

I've uploaded an experimental snapshot here: http://www.cs.utah.edu/~mflatt/tmp/racket-7.3.0.8-x86_64-win32.exe

Does this run any faster, or is it about the same?

In the experimental snapshot, the search behind collection-file-path and the module name resolver has a new cache. The cache records the immediate content of directories that are searched for top-level collections. For example, each directory that shows up in @alex-hhh's trace is in the cache, since each is a multi-collection package's directory. Filesystem change events on Windows allow this cache to be reliably invalidated when the filesystem changes. The cache is only used for the search for an outer collection directory (such as the steps shown in the trace).

With this change on a Windows 10 installation in VirtualBox running on a Mac OS host, the time for

(for ([i (in-range 1000)])
  (collection-file-path "main.rkt" "redex"))

goes from 1.7 seconds to 0.7 seconds. For comparison, on the Mac OS host, that loop takes 0.3 seconds. I picked "redex" because it's listed toward the end in "links.rktd" for both the Windows and Mac OS installations.

However, that difference doesn't translate to a noticibly faster start of Beginning Student in DrRacket on my Windows VM. The start time for me is around 1-1.5 seconds, not 3-4 seconds. Maybe the difference involves the filesystem and this change will help, or maybe the search isn't where the time goes after all.

The new cache does not eliminate the filesystem interaction to start Beginning Student — not by a long shot. After a collection directory is determined, there are several search steps to check for a module file and its ".zo" form. Worse, when a file is not found in a candidate collection directory, the search checks (for historical reasons) for ".ss" variants of the file name in source and ".zo" forms. Also for historical reasons, it checks for ".dll" forms. (The ".dll" part is already gone in Racket CS, and maybe we could transition away from ".ss" support.) Still, it this file-level searching turns out to be an order of magnitude fewer filesystem interactions than the initial collection-directory search, it makes sense to see whether it helps to avoid the filesystem for the collection part.

alex-hhh commented 5 years ago

Hi @mflatt , I installed the build you provided and your build does not have the "Beginning Student (HTDP)" language available, only the "DeinProgram" type languages. I used "deinprgram" as the example in my comment because:

Loading the DeinProgram beginner language is fast in the build you provided, but it is also fast in the released Racket 7.3 build.

mflatt commented 5 years ago

The Beginning Student language includes a dependency on part of DeinProgramm, so that's probably ok. But I'm puzzled that you don't have Beginning Student available as a language option in the insalled snapsho. Meanwhile, the fact that DeinProgramm languages start quickly compared to HtDP languages may be use useful data point.

mfelleisen commented 5 years ago

FWIW, I scanned @alex-hhh 's spreadsheet and didn't see any libraries that looked wrong for BSL.

alex-hhh commented 5 years ago

Hi @mflatt I downloaded and installed the build you provided on the computer where I run the initial tests, and in the build you provided, an empty BSL program evaluates in about 1 - 2 seconds vs the 3 - 4 seconds in the 7.3 version.

Also, the reason that the BSL language was not available on my home computer was that I disabled the loading of the HDTP in the "Installed Tools" preferences, I did this a long time ago, and I didn't immediately realize the problem.

mflatt commented 5 years ago

Thanks, @alex-hhh, this is good news — although 1-2 seconds certainly leaves room for improvement.

alex-hhh commented 5 years ago

I think the remaining time is spent actually reading in the BSL files and dependencies, which happens every time the program is run -- for comparison, there is no Disk IO at all for an empty "#lang racket" program -- I assume all modules are already in memory in that case.