mgalardini / pyseer

SEER, reimplemented in python 🐍🔮
http://pyseer.readthedocs.io
Apache License 2.0
104 stars 25 forks source link

WG mode should print out the lineage column header for selected variants #162

Closed mgalardini closed 2 years ago

mgalardini commented 2 years ago

Running this command line test:

$ python ../pyseer-runner.py --vcf variants.vcf.gz --phenotypes subset.pheno --wg enet --lineage-clusters lineage_clusters.txt --sequence-reweighting --alpha 1 --cor-filter 0.25
[...]
Finding and printing selected variants
variant af      filter-pvalue   lrt-pvalue      beta    notes
FM211187_184_G_A        4.00E-02        2.01E-01                3.40E-01        NA      bad-chisq
FM211187_293_G_A        2.00E-02        2.54E-01                -6.69E-01       NA      bad-chisq
FM211187_869_C_T        1.00E-01        7.84E-03                -1.63E+00       NA      bad-chisq
FM211187_926_G_A        2.00E-02        2.54E-01                -8.91E-05       NA      bad-chisq
FM211187_1981_G_A       9.40E-01        1.13E-01                2.66E-02        NA      bad-chisq
FM211187_2032_C_A       2.00E-02        2.54E-01                -6.78E-01       NA      bad-chisq
FM211187_2865_C_T       2.00E-02        2.54E-01                -4.09E-05       NA      bad-chisq
FM211187_2943_T_C       1.20E-01        2.06E-02                1.44E+00        NA      bad-chisq
FM211187_3982_C_A       4.00E-02        2.01E-01                1.39E-01        NA      bad-chisq
FM211187_6054_C_T       6.00E-02        1.13E-01                4.74E-03        NA      bad-chisq
FM211187_6139_A_G       2.00E-02        2.54E-01                -4.40E-02       NA      bad-chisq
FM211187_7799_C_T       4.00E-02        2.01E-01                2.46E-04        NA      bad-chisq
FM211187_8872_A_G       2.40E-01        2.51E-01                -1.29E-02       NA
FM211187_10838_C_T      5.20E-01        1.64E-01                -4.50E-01       NA
FM211187_11527_T_C      4.80E-01        1.64E-01                -2.77E-01       NA
FM211187_11559_T_G      4.00E-02        2.01E-01                8.47E-05        NA      bad-chisq
FM211187_11633_A_G      2.00E-02        2.54E-01                -2.81E-03       NA      bad-chisq
FM211187_11762_G_T      6.00E-02        1.13E-01                3.16E-03        NA      bad-chisq
FM211187_12304_C_CTTATA 2.00E-02        3.71E-01                7.90E-01        NA      bad-chisq
FM211187_13550_G_A      4.00E-02        2.01E-01                9.41E-06        NA      bad-chisq
FM211187_13781_C_T      2.00E-02        2.54E-01                -1.02E-04       NA      bad-chisq
FM211187_14044_G_A      3.00E-01        1.06E-01                5.63E-01        NA
[...]

The NAs are due to the way we figure out whether to print the lineage information (i.e. the lineage_dict must be non-empty: https://github.com/mgalardini/pyseer/blob/badea9104e390fb64e695e57dae1802ea2f8b9f5/pyseer/utils.py#L94), while the header is added if the --lineage option is used. Will make a patch shortly.