This PR is updating the threshold used to partition Pseudomonas gene expression data into PAO1 and PA14 compendia.
This threshold is based on the distribution of median accessory gene expression (i.e. we expect PAO1 samples to have high PAO1 median accessory gene expression and 0 PA14 median accessory expression). To determine the threshold we will look at the distribution of median accessory gene expression in samples SRA labeled as PAO1 vs non-PAO1.
The main new change is found in 0_decide_threshold.ipynb that explores where to place the threshold to separate the PAO1 (grey) vs non-PAO1 (blue) samples.
This PR is updating the threshold used to partition Pseudomonas gene expression data into PAO1 and PA14 compendia.
This threshold is based on the distribution of median accessory gene expression (i.e. we expect PAO1 samples to have high PAO1 median accessory gene expression and 0 PA14 median accessory expression). To determine the threshold we will look at the distribution of median accessory gene expression in samples SRA labeled as PAO1 vs non-PAO1.
The main new change is found in
0_decide_threshold.ipynb
that explores where to place the threshold to separate the PAO1 (grey) vs non-PAO1 (blue) samples.For separating between PA14 vs non-PA14 samples:
More details can be found in the readme