raphael-group / superdendrix

SuperDendrix, an algorithm for identifying differential dependencies and associated genomic features.
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Key error in superdendrix.py #1

Closed esebesty closed 3 years ago

esebesty commented 3 years ago

I started analyzing a depmap dataset with superdendrix, and everything is ok until the last step. When running the superdendrix.py script, using the following command line, based on your example:

python3 superdendrix.py -t 2 -T achilles-effect-sixsigma-filtered-2c.txt -Tc "UBE2M (9040)" -m ccle-mutation-oncokb.txt -p 10 -cp 10 -d "negative" -k 3 -nm ccle-mutation-oncokb-randmat/ -rs 2021 -x -curve -o output.txt -v 3 > log1.txt 2> log2.txt

I'm getting the following error:

Traceback (most recent call last):
  File "/Users/esebesty/Work/repos/superdendrix/src/superdendrix.py", line 637, in <module>
    run(get_parser().parse_args(sys.argv[1:]))
  File "/Users/esebesty/Work/repos/superdendrix/src/superdendrix.py", line 263, in run
    p_z, p_module, p_best_single, p_cov, p_act_cov, p_act_sample,p_x,p_y,p_mo = optimize_model(genes, samples, w, p_samples_to_genes, args, seed, logger) #change args.k
  File "/Users/esebesty/Work/repos/superdendrix/src/superdendrix.py", line 495, in optimize_model
    m.addConstr(y[p] <= quicksum(x[g] for g in samples_to_genes[p]))
  File "src/gurobipy/gurobi.pxi", line 3349, in gurobipy.quicksum
  File "/Users/esebesty/Work/repos/superdendrix/src/superdendrix.py", line 495, in <genexpr>
    m.addConstr(y[p] <= quicksum(x[g] for g in samples_to_genes[p]))
KeyError: 'EED_I_MUT'

Looks like this happens at loading the first random matrix. For the specific sample, indeed there is no EED_I_MUT feature in the original mutation list, but it is present in the random matrix.

Is there a problem with my command-line parameters, or is this a possible bug in superdendrix?

pty0111 commented 3 years ago

You need to provide a list of mutation features as an input to superdendrix.py for the argument -gf. Could you try adding a text file where each line is a list of mutations?

I just realized that this is missing from our example command line. I have updated the example accordingly.

esebesty commented 3 years ago

Thanks for the suggestion. However, I tried to add the -gf argument, the EED_I_MUT feature is present in the feature list, and still getting the same error.

pty0111 commented 3 years ago

Can you double-check whether EED_I_MUT is actually present in the mutation data (ccle-mutation-oncokb.txt)? If EED_I_MUT is not in the mutation data, it should not be included in the feature list.

esebesty commented 3 years ago

Yes, it's present.

snakemake/results - [main●] » grep EED_I_MUT ccle-mutation-oncokb.txt
ACH-000954  ABL2_O_MUT  ANKRD11_O_MUT   AR_O_MUT    ARID1A_I_MUT    ARID1A_O_MUT    ARID1B_O_MUT    ARID2_O_MUT ARID4B_O_MUT    ARID5B_O_MUT    ATM_O_MUT   ATRX_O_MUT  AXIN2_O_MUT BACH2_I_MUT BAP1_O_MUT  BCL11B_O_MUT    BCL9_O_MUT  BCOR_I_MUT  BCORL1_O_MUT    BRAF_O_MUT  BRD4_O_MUT  CASP8_I_MUT CASP8_O_MUT CBL_O_MUT   CIC_O_MUT   CTCF_O_MUT  CTNNB1_O_MUT    CUX1_O_MUT  DAXX_O_MUT  DICER1_O_MUT    DIS3_O_MUT  EED_I_MUT   EGFR_O_MUT  EP300_I_MUT EP400_O_MUT EPAS1_O_MUT EPHA7_I_MUT ERBB2_A_MUT ERBB2_O_MUT ERBB3_O_MUT ERBB4_O_MUT ERCC2_O_MUT ESR1_O_MUT  EZH1_O_MUT  FH_O_MUT    FLT1_O_MUT  FOXA1_O_MUT GRIN2A_O_MUT    GTF2I_O_MUT INPP4B_O_MUT    INPPL1_O_MUT    IRF8_O_MUT  JAK1_I_MUT  KMT2A_I_MUT KMT2B_O_MUT KMT2C_O_MUT KMT2D_O_MUT KRAS_A_MUT  LATS1_O_MUT MAP2K2_O_MUT    MAP3K1_I_MUT    MET_O_MUT   MRE11A_O_MUT    MSH6_I_MUT  MTOR_O_MUT  NF1_O_MUT   NFE2L2_O_MUT    NOTCH3_O_MUT    NPM1_I_MUT  NTRK1_O_MUT PAX5_O_MUT  PDGFRB_O_MUT    PIK3CA_A_MUT    PIK3CB_O_MUT    PIK3CD_O_MUT    PLCG1_O_MUT PMS2_I_MUT  POLD1_I_MUT POLD1_O_MUT PPM1D_O_MUT PPP2R1A_O_MUT   RAD21_I_MUT RBM10_O_MUT RBM15_O_MUT RELN_O_MUT  RNF43_I_MUT ROS1_O_MUT  RTEL1_O_MUT SAMHD1_O_MUT    SETD3_O_MUT SMAD4_O_MUT SMARCA4_I_MUT   SMARCA4_O_MUT   SOX9_O_MUT  SPRED1_I_MUT    TBX3_O_MUT  TERT_O_MUT  TET1_O_MUT  TET2_O_MUT  TGFBR2_O_MUT    TMPRSS2_O_MUT   TNFRSF14_O_MUT  TP53_O_MUT  TP53BP1_O_MUT   TSC1_O_MUT  TSC2_O_MUT  WT1_O_MUT   ZFHX3_I_MUT
snakemake/results - [main●] » grep EED_I_MUT ccle-mutation-feature-list.txt
EED_I_MUT
pty0111 commented 3 years ago

Just to confirm, you mentioned that EED_I_MUT is not in the original mutation list but it is present in the random matrix in the first comment. Did you mean that it's not present in the random matrix instead?

If the feature is indeed present in all three: the random matrix, mutation data, and the mutation list, would you please be able to send us these inputs so I can look more into the issue?

esebesty commented 3 years ago

Yes, it is present in the original mutation file, the feature list and the random matrices.

tmp/superdendrix » grep -r EED_I_MUT * | awk '{print $1}'
ccle-mutation-feature-list.txt:EED_I_MUT
ccle-mutation-full-randmat/9.tsv:ACH-000584
ccle-mutation-full-randmat/8.tsv:ACH-000984
ccle-mutation-full-randmat/3.tsv:ACH-000936
ccle-mutation-full-randmat/2.tsv:ACH-001274
ccle-mutation-full-randmat/0.tsv:ACH-000910
ccle-mutation-full-randmat/1.tsv:ACH-000657
ccle-mutation-full-randmat/5.tsv:ACH-000771
ccle-mutation-full-randmat/4.tsv:ACH-000957
ccle-mutation-full-randmat/6.tsv:ACH-001552
ccle-mutation-full-randmat/7.tsv:ACH-000984
ccle-mutation-full.txt:ACH-000954

I'll send you some sample data, thanks!