raphael-group / hatchet

HATCHet (Holistic Allele-specific Tumor Copy-number Heterogeneity) is an algorithm that infers allele and clone-specific CNAs and WGDs jointly across multiple tumor samples from the same patient, and that leverages the relationships between clones in these samples.
BSD 3-Clause "New" or "Revised" License
68 stars 32 forks source link

Gurobi solver: buffer overflow #116

Closed jbedo closed 2 years ago

jbedo commented 2 years ago

I'm having issues getting hatchet check-solver to run, it seems to encounter a buffer overflow error:

  File "/nix/store/53gmkyrmi5rksbc5bs9l7bwh29lgcvr7-HATCHet-0.4.9/lib/python3.9/site-packages/hatchet/bin/HATCHet.py", line 696, in execute
    raise RuntimeError(error("The following command failed: \n\t\t{}\nwith {}\n{}".format(cmd, buffer, msg)))
RuntimeError: The following command failed: 
        /nix/store/53gmkyrmi5rksbc5bs9l7bwh29lgcvr7-HATCHet-0.4.9/lib/python3.9/site-packages/hatchet/solve /nix/store/53gmkyrmi5rksbc5bs9l7bwh29lgcvr7-HATCHet-0.4.9/lib/python3.9/site-packages/hatchet/data/sample -f  -e 6 -j 1 -p 5 -u 0.03 -r 6700 -M 0 -v 2 -c 28:1:1 -n 2 -o /data/scratch/projects/punim1616/tmp/nix-build-bionix-HATCHet.drv-0/tmpapvi_kio/results.diploid.n2
with ['\x1b[95m\x1b[1m[21:08:42]### Parsing and checking input arguments\t\x1b[0m', '\x1b[92m[21:08:42]## \tInput prefix:  /nix/store/53gmkyrmi5rksbc5bs9l7bwh29lgcvr7-HATCHet-0.4.9/lib/python3.9/site-packages/hatchet/data/sample', 'Input SEG:  /nix/store/53gmkyrmi5rksbc5bs9l7bwh29lgcvr7-HATCHet-0.4.9/lib/python3.9/site-packages/hatchet/data/sample.seg', 'Input BBC:  /nix/store/53gmkyrmi5rksbc5bs9l7bwh29lgcvr7-HATCHet-0.4.9/lib/python3.9/site-packages/hatchet/data/sample.bbc', 'Number of clones:  2', 'Clonal copy numbers:  { 28 [Cluster] : 1|1 [CN] }', 'Help message:  0', 'Maximum number of copy-number states:  -1', 'Maximum integer copy number:  6', 'Number of jobs:  1', 'Number of seeds:  5', 'Minimum tumor-clone threshold:  0.03', 'Maximum resident memory:  -1', 'Time limit:  -1', 'Maximum number of iteratios:  10', 'Random seed:  6700', 'Solving mode:  Coordinate-descent + exact ILP', 'Verbose:  2', 'Output prefix:  /data/scratch/projects/punim1616/tmp/nix-build-bionix-HATCHet.drv-0/tmpapvi_kio/results.diploid.n2', 'Diploid threshold:  0.1', 'Base:  1', 'Force amp-del:  1\t\x1b[0m', '\x1b[95m\x1b[1m[21:08:42]### Reading the input SEG file\t\x1b[0m', '\x1b[95m\x1b[1m[21:08:42]### Scale the read-depth ratios into fractional copy numbers using the provided copy numbers\t\x1b[0m', '\x1b[95m\x1b[1m[21:08:42]### Compute allele-specific fractional copy numbers using BAF\t\x1b[0m', '\x1b[95m\x1b[1m[21:08:42]### Starting coordinate descent algorithm on 5 seeds\t\x1b[0m', '\x1b[92m[21:08:42]## Coordinate Descence {\t\x1b[0m', '*** buffer overflow detected ***: terminated', '']

Unexpected error during solve. Please run `hatchet check-solver` to ensure that the solver is working correctly.
srun: error: spartan-bm010: task 0: Exited with exit code 1
salloc: Relinquishing job allocation 32773970
salloc: Job allocation 32773970 has been revoked.

I don't think this is a gurobi licensing issue as it's a buffer overflow not the expected exception? Any suggestions on debugging?

Full log attached.

log.txt

vineetbansal commented 2 years ago

Hi @jbedo - sorry you're having problems with the solver. The error message above (and the log you've attached) are a result of running the solve command. What do you see when you run hatchet check-solver as suggested? The output you see there might be instructive.

EDIT:: Ok I see that this is the output of the actual hatchet check-solver. Can you try to run hatchet check-solver by first setting the environment variable HATCHET_COMPUTE_CN_SOLVER to any other pyomo-supported solver? (Like cbc, glpk or even just gurobi? (See docs). This should help us narrow down the problem I think.

jbedo commented 2 years ago

Haven't tried setting HATCHET_COMPUTE_CN_SOLVER to gurobi, but CBC works fine with both check-solver and a real sample I've put through it.

jbedo commented 2 years ago

I believe I've solved this: though hatchet was compiled and linked against the gurobi libraries, gurobipy was not available in the python environment. Adding gurobipy has allowed hatchet check to succeed.

miachom commented 2 years ago

Hi, @jbedo could you please let me know how did you run with cbc solver? I am running HATCHet v1.1.2 using cbc. However, I seem to have a problem where it's taking > 72 hrs to run after reaching the process "Running diploid with 3 clones". Other than this, I don't see any errors in the log.

jbedo commented 2 years ago

I didn't do anything special, CBC is pretty slow in this application. It really wasn't usable on our human WGS samples, gurobi is significantly faster.