haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.61k stars 690 forks source link

Deadlock during "cabal install" using parallel compilation #1476

Open kolmodin opened 11 years ago

kolmodin commented 11 years ago

This is using the newly released cabal-1.18.0. I built it using a clean user package-db, using GHC-7.6.2.

Using parallel compilation and sandbox, cabal fails in building the packages and instead hogs 100% CPU.

$ git clone https://github.com/kolmodin/spdy
$ cd spdy

$ grep jobs ~/.cabal/config 
jobs: $ncpus

$ nproc # number of cpus
32

$ cabal --version
cabal-install version 1.18.0
using version 1.18.0 of the Cabal library 

$ cabal update
$ cabal sandbox init
$ cabal install --only-dependencies 
Resolving dependencies...
Notice: installing into a sandbox located at
/usr/local/google/home/kolmodin/code/spdy/.cabal-sandbox
Configuring ansi-terminal-0.6...
Configuring asn1-types-0.2.0...
Configuring base-unicode-symbols-0.2.2.4...
Configuring binary-0.7.1.0...
Configuring base64-bytestring-1.0.0.1...
Configuring byteable-0.1.1...
Downloading byteorder-1.0.4...
Configuring cereal-0.3.5.2...
Configuring cipher-rc4-0.1.2...
Configuring data-default-class-0.0.1...
Configuring date-cache-0.3.0...
Configuring dlist-0.5...
Downloading entropy-0.2.2.2...
Configuring byteorder-1.0.4...
Downloading file-embed-0.0.4.9...
Configuring nats-0.1...
Configuring primitive-0.5.0.1...
Configuring entropy-0.2.2.2...
Configuring random-1.0.1.1...
Downloading largeword-1.0.5...
Configuring stm-2.4.2...
Configuring stringsearch-0.3.6.4...
Configuring file-embed-0.0.4.9...
Downloading syb-0.4.1...
Configuring tagged-0.4.5...
Configuring transformers-0.3.0.0...
Configuring largeword-1.0.5...
Configuring unix-compat-0.4.1.1...
Downloading text-0.11.3.1...
Configuring word8-0.0.3...

After this, nothing happens. The cabal process is stuck in some loop, using all cpu it can get.

If I explicitly specify fewer CPUs, it still fails.

$ cabal install --only-dependencies --jobs=2
Resolving dependencies...
Notice: installing into a sandbox located at
/usr/local/google/home/kolmodin/tmp/spdy/.cabal-sandbox
Configuring ansi-terminal-0.6...
Configuring base-unicode-symbols-0.2.2.4...

A package with fewer dependencies also fails, even with few CPUs.

$ cabal install cabal-install --jobs=2 --package-db=clear --package-db=global
23Skidoo commented 11 years ago

Thanks! This is very interesting. Which OS/architecture are you on?

kolmodin commented 11 years ago

OS is Goobuntu, essentially Ubuntu 12.04. The machine is x86_64.

It did manage a few times to install cabal-install, so it seems it's not 100% reproducable.

I tried with ghc 7.6.3, same thing. I'd like to try to reproduce with ghc 7.7 as well, but some dependencies said they don't support base-4.7 and I didn't have time to setup local versions to try with.

23Skidoo commented 11 years ago

Thanks, will try to reproduce.

23Skidoo commented 11 years ago

So far failed to reproduce on an 8-core EC2 instance. Will try on a 16-core one.

23Skidoo commented 11 years ago

Failed to reproduce on a 16-core instance running Ubuntu 12.04/x86_64.

23Skidoo commented 11 years ago

@kolmodin

I have a feeling that this is a GHC RTS bug that only happens on systems with a large number of cores. Unfortunately, the largest EC2 instance I can rent has only 16 cores, which apparently isn't enough to trigger this behaviour.

If you have other ideas on how to reproduce this, I'll be happy to follow your advice.

I'd like to try to reproduce with ghc 7.7 as well, but some dependencies said they don't support base-4.7 and I didn't have time to setup local versions to try with.

IIRC it's only HTTP. All you need to do is relax the version bound in the .cabal file.