Open ugexe opened 5 years ago
On a local windows VM it gets stuck when installing zef during precompilation
C:\Users\ugexe\zef>set RAKUDO_LOG_PRECOMP=1
C:\Users\ugexe\zef>perl6 -I. bin\zef install C:\Users\ugexe\zef --debug
===> Fetching: C:\Users\ugexe\zef
Fetching with plugin: Zef::Service::FetchPath+{<anon|1>}
===> Fetching [OK]: C:\Users\ugexe\zef to C:\Users\ugexe/.zef/tmp\1567304861.7128.1365\zef\zef_1567304861
===> Extracting: C:\Users\ugexe\zef
Extracting with plugin: Zef::Service::FetchPath+{<anon|1>}
===> Extraction [OK]: C:\Users\ugexe\zef to C:\Users\ugexe/.zef/store\zef_1567304861
===> Dependencies: NativeCall, Test
===> Filtering: zef:ver<0.7.4>:auth<github:ugexe>:api<0>
===> Filtering [OK] for zef:ver<0.7.4>:auth<github:ugexe>:api<0>
===> # SKIP: No need to build zef:ver<0.7.4>:auth<github:ugexe>:api<0>
===> Testing: zef:ver<0.7.4>:auth<github:ugexe>:api<0>
Testing with plugin: Zef::Service::Shell::prove+{<anon|1>}
t/00-load.t ....................... ok
t/distribution-depends-parsing.t .. ok
t/identity.t ...................... ok
t/utils-filesystem.t .............. ok
All tests successful.
Files=4, Tests=16, 46 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU)
Result: PASS
===> Testing [OK] for zef:ver<0.7.4>:auth<github:ugexe>:api<0>
===> Installing: zef:ver<0.7.4>:auth<github:ugexe>:api<0>
Installing Zef::Fetch for zef
Installing Zef::Service::Shell::tar for zef
Installing Zef::Build for zef
Installing Zef::Service::Shell::PowerShell::download for zef
Installing Zef::Service::Shell::prove for zef
Installing Zef::Extract for zef
Installing Zef::Utils::FileSystem for zef
Installing Zef::Service::Shell::p5tar for zef
Installing Zef::Repository::Ecosystems for zef
Installing Zef::Test for zef
Installing Zef::Service::Shell::PowerShell::unzip for zef
Installing Zef::Service::Shell::DistributionBuilder for zef
Installing Zef::Service::Shell::Test for zef
Installing Zef::Service::Shell::wget for zef
Installing Zef::Utils::SystemInfo for zef
Installing Zef::Config for zef
Installing Zef::Repository::MetaCPAN for zef
Installing Zef::Repository for zef
Installing Zef::Distribution for zef
Installing Zef for zef
Installing Zef::Service::Shell::LegacyBuild for zef
Installing Zef::Distribution::DependencySpecification for zef
Installing Zef::Client for zef
Installing Zef::Service::Shell::git for zef
Installing Zef::CLI for zef
Installing Zef::Utils::SystemQuery for zef
Installing Zef::Distribution::Local for zef
Installing Zef::Utils::URI for zef
Installing Zef::Service::Shell::PowerShell for zef
Installing Zef::Repository::LocalCache for zef
Installing Zef::Service::FetchPath for zef
Installing Zef::Service::Shell::curl for zef
Installing Zef::Service::TAP for zef
Installing Zef::Service::InstallPM6 for zef
Installing Zef::Identity for zef
Installing Zef::Report for zef
Installing Zef::Install for zef
Installing Zef::Service::P6CReporter for zef
Installing Zef::Service::Shell::unzip for zef
Precompiling AB55A923EA5219C0C4EEC86259A425788E26F74D (Zef)
# never progresses beyond this point
commit 3ed7e46 seems to be where the deadlock issue starts.
commit 19e075f has no deadlock, but has a new error elsewhere in the CI testing process -- https://ci.appveyor.com/project/ugexe/zef/builds/27150039
@timo ?
With MVM_SPESH_DISABLE=1
the deadlock goes away
@timo this appears to be related to your work (linked earlier)
So without feedback I’m becoming inclined to revert the commits in question. If someone has a good reason why I shouldn’t and also has a hunch on how to fix this then speak up.
sorry, i didn't notice that i was pinged on github, i'll have a closer look now
i don't see a way for the commit in question to cause deadlock problems; can you hook up a debugger to the deadlocked process and print every thread's backtrace? ideally turning off the JIT in the environment so that backtraces don't get corrupted. that should give a lot of helpful information
I could probably figure out how to do all of that for a precompiling process on windows. However, I suspect it will be easier for you to give me suggestions of moar changes to make and test, i.e. switching these two lines https://github.com/MoarVM/MoarVM/blob/3faf1985d277fc53c111464b73aac87f27158e7d/src/moar.c#L394-L397 (I have no reason to believe that is wrong... just mutexes can cause deadlocks so maybe it should be scrutinized)
https://github.com/MoarVM/MoarVM/commit/c5a600799bdd5797b22c3e555f920e6f4c52adee - please try this, among other things it moves initializing that mutex up, but mostly disables anything related to the whole subsystem
Can you rebase this so it works with perl Configure.pl --gen-moar=c5a600799bdd5797b22c3e555f920e6f4c52adee --gen-nqp --backends=moar
?
https://ci.appveyor.com/project/ugexe/zef/builds/27603566#L74
@timo ping -- I don't have write access to MoarVM so I can't push up a rebase that I can pass to the Configure.pl
command in the appveyor config.
If I cherry-pick https://github.com/MoarVM/MoarVM/commit/c5a600799bdd5797b22c3e555f920e6f4c52adee and resolve conflicts the deadlock goes away.
diff --git a/src/gc/orchestrate.c b/src/gc/orchestrate.c
index 42a52f45b..96ee115ce 100644
--- a/src/gc/orchestrate.c
+++ b/src/gc/orchestrate.c
@@ -469,7 +469,7 @@ static void run_gc(MVMThreadContext *tc, MVMuint8 what_to_do) {
data[0] = MVM_load(&tc->instance->gc_seq_number);
data[1] = start_time / 1000;
data[2] = (start_time - tc->instance->subscriptions.vm_startup_time) / 1000;
- data[3] = (end_time - start_time) / 1000;
+ //data[3] = (end_time - start_time) / 1000;
data[4] = gen == MVMGCGenerations_Both;
data[5] = tc->gc_promoted_bytes;
data[6] = MVM_load(&tc->instance->gc_promoted_bytes_since_last_full);
This seems to fix it?
Maybe weird line endings?
CI tests are good on moar HEAD now
Thank you, @ugexe!
OK, I'll reopen it in case someone has an idea how to write a test for this.
If you stumble upon this ticket a year later, please just close it.
The Problem
zef
no longer finishes CI testing on appveyor. Seems to be related to runtime module in some way. See -- https://ci.appveyor.com/project/ugexe/zef/builds/27095833#L5190 (which has been rerun multiple times always stopping in the same spot)Expected Behavior
rakudo doesn't deadlock when running zefs CI testing on appveyor like https://ci.appveyor.com/project/ugexe/zef/builds/26433562
Actual Behavior
zef deadlocks while trying to use runtime loaded code
Steps to Reproduce
Copy zef
.appveyor.yml
to a repository and have appveyor process it.Environment
2019.07.1