raphael-group / THetA

Tumor Heterogeneity Analysis (THetA) and THetA2 are algorithms that estimate the tumor purity and clonal/subclonal copy number aberrations directly from high-throughput DNA sequencing data. This repository includes the updated algorithm, called THetA2.
http://compbio.cs.brown.edu/projects/theta/
70 stars 33 forks source link

Getting a "X"on my integer copy number prediction #17

Open jsmedmar opened 7 years ago

jsmedmar commented 7 years ago

I'm running THetA2 on the CNVKIT output of a batch run with one target and many normals.

Why do I have a "X" in my integer copy number prediction? Output looks like this:

# *.n2.results

#NLL    mu  C   p*
365769128.951   0.0030184841461,0.996981515854  1329:1816:2099:2375:25328:5456:33:9054:537:9538:3649:2567:1:115:1390:33:3568:2637:831:1211:X:7534:1814:3773:837:6639:2792:2:3735:946:19827:4141:33:2644:9242:5553:41359:4995:39503:2241:960:1222:86:2570:2481:12901:2294:2008:7258:24356:147:16662:171:1823:3389:3356:745:36082:2247:12667:131211:7846:599:38650:4551:1978:3169:X:856:2293:1787:3922:1552:461:5393:4989:6192:1165:1347:819:1543:73909:1153:13584:1769:5056:65:413:1132:270:221:1200:482:3374:11604:2079:2746:99:3519:7721:1614:1629:870:148:916:2864:X:642:16597:10213:2236:720:272:231:759:3208:609:2:6592:7513:1159:2:1102:329:1933:918:25:4923:0:0   0.0245194761859,0.0111053827179,0.0226261809895,0.0113236589054,0.00262521711754,0.0062205925347,0.0168622943627,0.000938436749688,0.0112433253548,0.0049430138142,0.00567322579836,0.0159640239086,0.00123285995294,0.000679453576372,0.00677140478578,0.0185317607147,0.00184909750718,0.00163993488162,0.0181740144775,0.0129284888566,X,0.00780890592681,0.00564058584458,0.000391067502875,0.0176979512031,0.00206437478359,0.00231510433866,0.00206116244724,0.00929109237479,0.0066675618807,0.00205504513748,0.00128763077003,0.0148233559492,0.00191833615247,0.00287376816522,0.00115112468927,0.00428681083003,0.00776589481809,0.00409443868042,0.0013936653625,0.0115424017803,0.0012665944595,0.000927100027809,0.0082577183567,0.0113147438544,0.00936021544654,0.00356656419233,0.00166501914343,0.106071977353,0.0126223524804,0.0357612375498,0.00518099017508,0.00652264186386,0.000755809468961,0.00667406100099,0.00278276766788,0.00355207281826,0.00373985617837,0.0170016600794,0.00525167927577,0.0135998616849,0.0089455191582,0.00353891632606,0.0240361575881,0.00801900641241,0.0102508964652,0.0137954726793,X,0.00709791917785,0.0199640529789,0.0111132606534,0.00203255586107,0.00209122609574,0.00148126427842,0.00670774246359,0.00155131328544,0.00577614527864,0.0025357807663,0.00879577929644,0.00169777969171,0.0100756345246,0.00766057886332,0.00310719966991,0.00140796575709,0.00146684266172,0.00157214669173,0.00045143252005,0.0101882032923,0.000821317385521,0.000867559807602,0.00343605144862,0.00547268359549,0.00169861766867,0.00209826981996,0.0108246704204,0.0120672415831,0.00455392316972,0.0330021339833,0.00291792568679,0.00960327499133,0.0107065453217,0.0162090722366,0.00604172987413,0.00668852260196,0.00645611835419,0.00296850755033,X,0.00771900530686,0.00688103821535,0.00740995986084,0.00973389129215,0.000746277546224,0.00112772391281,0.00270561522118,0.00228143166272,0.00232754156554,0.00246178682917,0.00686652160372,0.000683253434851,0.005450997777,0.0032434984531,0.0137126665384,0.00342664774348,0.0038193208153,0.000400707467732,0.0200766952697,0.0258666361191,0.00204105443402,5.52066138046e-05,0.000310519472417

My command is:

RunTHetA \
    path_to.interval_count \
    -d . \
    --FORCE \
    --MIN_FRAC 0.005 \
    --N 2
egeulgen commented 3 years ago

I second this question. @jsmedmar have you been able to find an answer to this?

jsmedmar commented 3 years ago

Just took a look into my code and it seems I ended up setting it to 0 if I got any string:

        # Parse integer copy number prediction, see:
        # https://github.com/raphael-group/THetA/blob/master/doc/MANUAL.txt#L23
        # Found a string integer CN when running using a batch of normals.
        # Thats the reason of `e.isdigit()`, please see the following ticket:
        # https://github.com/raphael-group/THetA/issues/17
        theta_results = join(outdir, tumor_name + ".n2.results")
        results = pd.read_csv(theta_results, delimiter="\t")
        ncn["zint_cpnumber"] = 2
        tcn["zint_cpnumber"] = [
            int(e) if e.isdigit() else 0 for e in results["C"][0].split(":")
        ]