virajbdeshpande / AmpliconArchitect

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GRCh37 reference sequence and one or more regions of interest. Please "watch" this repository for improvements in runtime, accuracy and annotations for GRCh38 human reference genome coming up soon.
Other
131 stars 41 forks source link

MOSEK: Failed to solve to optimality #139

Closed shwong-tw closed 1 year ago

shwong-tw commented 1 year ago

Hi,

Thanks for making this software available to the community. I've done several rounds of AA (using different seed intervals) and it usually runs successfully.

In a recent run, some of the samples encountered optimization issue at the MOSEK step on specific amplicons (please find error msg below). As a result, AA terminated the run and reported error msg. The problematic amplicons ended up with cycles/graph.txt files empty, while the edges_cnseg/edges.txt files are only sometimes empty, amplicon log was not generated.

I wonder if it is possible to include an error-handling mechanism, so that upon MOSEK optimization failure, AA would report warning and skip this amplicon instead of terminating the entire AA run?

Thank you very much!

versions

AmpliconArchitect version 1.3.r5 python version 2.7.18

software log:

INFO:MOSEK:Beginning MOSEK call DEBUG:MOSEK:Problem DEBUG:MOSEK: Name : DEBUG:MOSEK: Objective sense : min DEBUG:MOSEK: Type : CONIC (conic optimization problem) DEBUG:MOSEK: Constraints : 74 DEBUG:MOSEK: Cones : 20 DEBUG:MOSEK: Scalar variables : 101 DEBUG:MOSEK: Matrix variables : 0 DEBUG:MOSEK: Integer variables : 0 DEBUG:MOSEK: DEBUG:MOSEK:Optimizer started. DEBUG:MOSEK:Presolve started. DEBUG:MOSEK:Linear dependency checker started. DEBUG:MOSEK:Linear dependency checker terminated. DEBUG:MOSEK:Eliminator started. DEBUG:MOSEK:Freed constraints in eliminator : 0 DEBUG:MOSEK:Eliminator terminated. DEBUG:MOSEK:Eliminator - tries : 1 time : 0.00 DEBUG:MOSEK:Lin. dep. - tries : 1 time : 0.00 DEBUG:MOSEK:Lin. dep. - number : 0 DEBUG:MOSEK:Presolve terminated. Time: 0.00 DEBUG:MOSEK:Problem DEBUG:MOSEK: Name : DEBUG:MOSEK: Objective sense : min DEBUG:MOSEK: Type : CONIC (conic optimization problem) DEBUG:MOSEK: Constraints : 74 DEBUG:MOSEK: Cones : 20 DEBUG:MOSEK: Scalar variables : 101 DEBUG:MOSEK: Matrix variables : 0 DEBUG:MOSEK: Integer variables : 0 DEBUG:MOSEK: DEBUG:MOSEK:Optimizer - threads : 16 DEBUG:MOSEK:Optimizer - solved problem : the primal DEBUG:MOSEK:Optimizer - Constraints : 14 DEBUG:MOSEK:Optimizer - Cones : 20 DEBUG:MOSEK:Optimizer - Scalar variables : 60 conic : 60 DEBUG:MOSEK:Optimizer - Semi-definite variables: 0 scalarized : 0 DEBUG:MOSEK:Factor - setup time : 0.00 dense det. time : 0.00 DEBUG:MOSEK:Factor - ML order time : 0.00 GP order time : 0.00 DEBUG:MOSEK:Factor - nonzeros before factor : 32 after factor : 33 DEBUG:MOSEK:Factor - dense dim. : 0 flops : 2.97e+02 DEBUG:MOSEK:ITE PFEAS DFEAS GFEAS PRSTATUS POBJ DOBJ MU TIME DEBUG:MOSEK:0 1.3e+00 1.2e+05 2.2e+05 0.00e+00 2.195393368e+05 -1.610204003e+01 1.0e+00 0.00 DEBUG:MOSEK:1 3.5e-01 3.2e+04 1.1e+05 -1.00e+00 2.194465010e+05 -1.007799447e+02 2.7e-01 0.00 DEBUG:MOSEK:2 1.9e-01 1.7e+04 8.4e+04 -1.00e+00 2.193044224e+05 -2.103986336e+02 1.5e-01 0.00 DEBUG:MOSEK:3 4.4e-02 4.0e+03 4.0e+04 -9.99e-01 2.181624443e+05 -1.055821946e+03 3.4e-02 0.00 DEBUG:MOSEK:4 4.1e-03 3.7e+02 1.2e+04 -9.97e-01 2.038622896e+05 -1.129830442e+04 3.2e-03 0.00 DEBUG:MOSEK:5 9.8e-04 8.8e+01 5.6e+03 -9.54e-01 1.543394909e+05 -4.279955337e+04 7.6e-04 0.00 DEBUG:MOSEK:6 4.1e-04 3.7e+01 2.7e+03 -6.84e-01 8.847228754e+04 -5.783290378e+04 3.2e-04 0.00 DEBUG:MOSEK:7 9.4e-05 8.5e+00 4.8e+02 -1.19e-01 1.464186063e+04 -3.771123125e+04 7.3e-05 0.00 DEBUG:MOSEK:8 1.7e-05 1.5e+00 3.3e+01 8.14e-01 5.215348623e+03 -4.594639316e+03 1.3e-05 0.00 DEBUG:MOSEK:9 2.4e-06 2.1e-01 1.8e+00 9.52e-01 9.469611652e+02 -4.758253551e+02 1.8e-06 0.00 DEBUG:MOSEK:10 4.4e-07 4.0e-02 1.4e-01 9.94e-01 4.802469218e+02 2.140237206e+02 3.4e-07 0.00 DEBUG:MOSEK:11 5.7e-08 5.2e-03 6.8e-03 9.99e-01 3.742639204e+02 3.396874954e+02 4.4e-08 0.00 DEBUG:MOSEK:12 4.6e-09 4.2e-04 1.6e-04 1.00e+00 3.606341366e+02 3.578392626e+02 3.6e-09 0.00 DEBUG:MOSEK:13 1.9e-10 1.7e-05 1.3e-06 1.00e+00 3.594780197e+02 3.593626589e+02 1.5e-10 0.00 DEBUG:MOSEK:14 9.9e-12 8.5e-07 1.4e-08 1.00e+00 3.594304559e+02 3.594248051e+02 7.3e-12 0.00 DEBUG:MOSEK:15 9.9e-12 8.5e-07 1.4e-08 1.00e+00 3.594304559e+02 3.594248051e+02 7.3e-12 0.01 DEBUG:MOSEK:16 9.9e-12 8.5e-07 1.4e-08 1.00e+00 3.594304559e+02 3.594248051e+02 7.3e-12 0.01 DEBUG:MOSEK:Optimizer terminated. Time: 0.01 DEBUG:MOSEK: DEBUG:MOSEK: DEBUG:MOSEK:Interior-point solution summary DEBUG:MOSEK: Problem status : UNKNOWN DEBUG:MOSEK: Solution status : UNKNOWN DEBUG:MOSEK: Primal. obj: 3.5943045590e+02 nrm: 4e+00 Viol. con: 4e-08 var: 0e+00 cones: 0e+00 DEBUG:MOSEK: Dual. obj: 3.5942480509e+02 nrm: 1e+05 Viol. con: 0e+00 var: 1e-03 cones: 0e+00 ERROR:MOSEK:Error when using MOSEK: Failed to solve to optimality. Solution status SolutionStatus.Unknown

cluster log:

[MOSEK:INFO] Beginning MOSEK call [MOSEK:ERROR] Error when using MOSEK: Failed to solve to optimality. Solution status SolutionStatus.Unknown Traceback (most recent call last): File "/home/programs/AmpliconArchitect-master/src/AmpliconArchitect.py", line 322, in bamFileb2b.interval_filter_vertices(ilist, amplicon_name=amplicon_name, runmode=args.runmode) File "/home/programs/AmpliconArchitect-master/src/bam_to_breakpoint.py", line 2142, in interval_filter_vertices res = mosek_solver.call_mosek(n, m, asub, aval, coeff_c, coeff_f, coeff_g, const_h) File "/home/programs/AmpliconArchitect-master/src/mosek_solver.py", line 60, in call_mosek filename = save_mosek_input(n, m, asub, aval, coeff_c, coeff_f, coeff_g, const_h)
File "/home/programs/AmpliconArchitect-master/src/mosek_solver.py", line 208, in save_mosek_input with open(filename, "w") as f: IOError: [Errno 13] Permission denied: 'mosekinput-1.json'

jluebeck commented 1 year ago

Hi, can you share the version of Mosek you are using? If it is version 8, please upgrade to Mosek version 9 or 10 (via pip or conda). This error may also occur if the bam file is from targeted sequencing (e.g. Circle-Seq), as opposed to whole-genome sequencing.

Thanks, Jens

shwong-tw commented 1 year ago

HI Jens,

Sorry for the missing info, I ran AA with docker option and the Mosek version it used was 9.2.49. The data is from whole-genome sequencing, and the previous AA run (when specifying other seed intervals) with the same sample was successful. Please let me know if furhter information would be helpful. Thanks a lot!

Cheers, Siao-Han

jluebeck commented 1 year ago

Thanks for clarifying Siao-Han,

How are you providing seed intervals to the tool? What is the source of the copy number estimates given to the seeding regions? Are you using AmpliconSuite-pipeline to generate the seed regions? If not, how are you selecting the seeds? It's possible AmpliconAchitect is being given some problematic regions of the genome.

If you are able to provide

I would be happy to take a look.

Thanks, Jens

shwong-tw commented 1 year ago

Hi Jens,

Thanks for the prompt reply. I did select seed regions with my own criteria (therefore tried different runs on the same sample). It's simply differnet copy-number cutoffs, without filtering by interval size. If there are some rule of thumbs (e.g. size) indicating bad intervals I can check whether this is the case in my seed intervals.

Edit: I've just checked that the smallest interval sizes were around 50kbp in the failed samples. However, other successfully finished samples contain smaller intervals (down to 25kbp)

The exact command would be argstring=" --bam /home/bam_dir/tumor.bam --bed /home/bed_dir/tumor.bed --out /home/output/H059-5DFS_tumor7 --downsample 0 --ref GRCh37" /home/run_aa_script.sh

And an example log is (hopefully) attached. Example.log

But to be honest, please don't spend too much time on looking into the log. The only thing I hope for is that AA can finish the run and I wouldn't mind there were one or two amplicons skipped/not resolved.

Thanks a lot!

Cheers, Siao-Han

jluebeck commented 1 year ago

Hi Siao-Han,

Thanks for clarifying. We cannot guarantee that AA will work when deployed on any custom set of intervals - they may contain unfiltered repeat elements and other problematic parts of the genome. A standardized method for selecting these intervals is available in AmpliconSuite-pipeline, which uses databases of these known problematic regions and other criteria to establish a reliable set of seed regions where focal amplifications may exist.

Thank, Jens

shwong-tw commented 1 year ago

I understand, thanks for the feedback :)