Closed guibou closed 6 months ago
I think @wz1000 is our best bet for guessing what's going on.
I would add more traces in Session.hs to figure out exactly where it gets stuck.
@wz1000 I will do that, thank you for the filename, it helps me a lot.
I've added a bunch of debug log statment in Session.hs
and related files and I'm starting to discover something:
In IDE/Session.hs
:
let targetEnv = (if isBad ci then multi_errs else [], Just henv)
targetDepends = componentDependencyInfo ci
res = ( targetEnv, targetDepends)
liftIO $ logWith recorder Logger.Debug $ LogDLLLoadError "Foo"
logWith recorder Debug $ LogNewComponentCache res
liftIO $ logWith recorder Logger.Debug $ LogDLLLoadError "Bar"
evaluate $ liftRnf rwhnf $ componentTargets ci
Foo
is lagged, but not Bar
. If I comment the logWith
statment, then Bar
happen and my session continue (and is stuck on something else later, but we'll first investigate here).
Actually, if I even replace the logWith
by a simple liftIO (print res)
, it is stuck and my console output ((
, which is the beginning of the tuple contained in res
. So looks like there is a "lazy" lock hidden in the computation of this tuple (either infinite loop, or something worse hidden behind an unsafePerformIO
)
The tuple in res
is (([(NormalizedFilePath, ShowDiagnostic, Diagnostic)], Maybe HscEnvEq), DependencyInfo)
and snd
as well as snd . fst
component are fine. So the problem is in [(NormalizedFilePath, ShowDiagnostic, Diagnostic)]
- let targetEnv = (if isBad ci then multi_errs else [], Just henv)
+ let targetEnv = ([], Just henv)
Completely unlocks my HLS, and I even have diagnostics.
I'll apply this as a patch at work in order to progress forward. I can investigate a bit more if you give me some guidance.
edit isBad ci
terminates, the problem is in multi_errs
.
Okay, so that does seem like it shouldn't get stuck. Do we have a test case for that? It looks like it's specifically for things that violate the closure property.
Obviously we should also give an error, but it looks like we get stuck in the process of trying to work out the error :joy:
Maybe checkHomeUnitsClosed
is not in fact always terminating in the presence of weird package dependencies?
Yeah this might be a GHC bug. Can you print out (hsc_unit_env hscEnv')
and (hsc_all_home_unit_ids hscEnv')
please?
Alternatively, looking at the comment on https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Driver/Make.hs#L1756, perhaps it just has bad complexity and so since you have a massive package db it's just doing some quadratic or otherwise huge amount of work?
Also printing out the closure_errs
might be enlightening. It seems likely that the home units are not closed for your project. This means that there are "home units"(units that you've asked HLS to load) A
and B
, such that A
depends on B
and there is a unit C
such that A
depends on C
and C
depends on B and C
is not a home unit.
@wz1000, sorry, I'm taking too much time to answer.
Apparently I'm not the only one with this issue, so my "weird" setup with nix may not be responsible.
Yeah this might be a GHC bug. Can you print out (hsc_unit_env hscEnv') and (hsc_all_home_unit_ids hscEnv') please?
hsc_unit_env
is an UnitEnv
which does not have Show
. I've tried to add one, but stopped after adding Show
instance for dozens of types. Do you know another way to display it? (I may try to start in ghci and use :force
or :print
, but I'm afraid the setup will be painful).
for hsc_all_home_unit_ids
, the result is surprising, it is [main,main-457d6b4053974d5f4ce4d2060404b2ae241df152]
, so apparently I have two home unit, and one does not have an hash it its name. Is that something expected?
Recall that I'm not using "home unit" yet and my build is only composed of -package-id
to hackage, or file references. However it is possible that I have multiples Main
module, do you think it can be a problem. I'll check.
Also printing out the closure_errs might be enlightening.
An infinite loop is hidden in closure_errs
too. I'll try to dig a bit and see if I can extract a bit of the informations.
Note, a few line above, the call closure_errs = checkHomeUnitsClosed (hsc_unit_env hscEnv') (hsc_all_home_unit_ids hscEnv') pkg_deps
, I can observe that pkg_deps
is finite. However it does not contains any reference to main
, only main-457d6b4053974d5f4ce4d2060404b2ae241df152
.
hsc_unit_env is an UnitEnv which does not have Show
you want to use pprUnitEnvGraph
from GHC.
Some news. Removing the main
entry in pkg_deps
also fix the lock.
Thanks @carbolymer, thanks to your reproducer I was able to diagnose the issue and put up a fix in #4109
Thank you @carbolymer, and thank you @wz1000!
First, I apologize for the lack of reproducibility in this issue. I'm a bit lost here and I need your help. I'll try to gather as much information as possible, but for now I don't have much.
Since we upgraded at work HLS from 2.4.0.0 to 2.6.0.0, HLS is "stuck" on some (but not all) of our projects.
By stuck, I mean that HLS starts, gather information for the build flags thank to our
hie.yaml
/cradle:bios:shell
command, and then nothing else is happening. CPU runs at 100% (of one core) and the latest entry in the log is:Your environment
We are using Linux. I'm debuging that on NixOS, but users had observed the same problem on Ubuntu.
GHC 9.6.4 (But same problem with 9.8.1), installed from nixpkgs. HLS is installed from nixpkgs too.
The project is built with a custom build system using nix and we use a
cradle:bios:shell
command to output the build flags. Note that the command works fine, so HLS is not blocked on this.A few notes about our build system.
Our build system is based on nix, but that's orthogonal to our problem (I think) because mostly what it does is populating the environment with the correct version of GHC and providing flags for HLS.
Note that we have one specifitity which may be responsible for the problem. We have a "main project" and multiples subprojects.
ghc-pkg check
only warns about missinghaddock
entries. None of our subproject work correctly with HLS, they are all stuck.HLS bisect
I've bisected HLS (starting on tag
2.6.0.0
and2.4.0.0
as "bad" and "good" and I hit:As the possibly offending commit.
Note that we are not using the multi unit argument syntax in our project, but next step (after having a look in diagonal on this commit to see if I can see something which could lead to the "live lock") will be to try to convert our codebase to the multi unit syntax.
Steps to reproduce
No simple reproducer yet, sorry. The main codebase is private and includes client IP that we cannot unfortunately share easilly. I'm working on making a simpler reproducer.
Expected behaviour
Actual behaviour
Debug information
Here is the content of the log, using
--debug --log ...
:(some name had been replaced by
foo
orbar
because that's a private "client" project)our HIE_BIOS file is like:
Note that if I remove the
-package-db
argument (and all the-package-id
which are not in a "default" package db), HLS starts immediatly and perfectly works (and fails because most of myimport
cannot be resolved). I tried to remove the-package-id
only and keep the-package-db
, and then HLS is stuck again. So definitiely, there may be something in my package db which blocks HLS.