hepcat72 / vcfSampleCompare

Filter and rank variant call files (VCF) based on comparative evidence ratios between groups of samples.
GNU General Public License v3.0
2 stars 1 forks source link

Handle the case where all obs rats are the same between the 2 groups #15

Closed hepcat72 closed 5 years ago

hepcat72 commented 5 years ago

In my test case, all had the same AO/DP ratio (1). The output looked like this:

Chromosome  3163974 .   C   G   1   0   92  1   0   92      Caulobacter_spp,NA1000      Caulobacter_crescentus_strain_CB15  
Chromosome  1901511 .   T   C   1   0   38  1   0   38      Caulobacter_spp,Caulobacter_crescentus_strain_CB15      NA1000  

They were correctly given the score of 0, but the observation ratios were not filled in.

hepcat72 commented 5 years ago

This appears to be fixed in the current version (as in, the ORs are filled in), however the sample groups are smaller. I'm not sure whether that is the best behavior. Both are "wrong" in that the best groups have no logic because there is only 1 group possible. It should be filtered, but when the user supplies --nofilter, what should be output? Should there be a way to output no groups when groups are dynamically generated or should 1 group be populated and the other unpopulated or should it do what it's doing and expect it to not make sense since the score is the worst possible score?

hepcat72 commented 5 years ago

It looks like the groups are smaller due to how createMaxDynamicSampleGroupPair is designed to work. It has a "known issue" in the comments where, if the score between the 2 groups is the same, the group added to is arbitrary. However, it will not grow the max group as long as the score is below the --separation-gap. It will create the min group regardless of the score. It just won't add to a group already below the score.

I I can test whether this is an issue that is one that doesn't behave as designed by setting -a 0 --force. If the groups are not then created to include all samples in the case shown above, then there is something to fix. Otherwise, it's just an issue to document.

hepcat72 commented 5 years ago

OK, it grew the group:

Chromosome  3163974 .   C   G   1   0   0   0.75    1   0   0   0.75    1;1 1   Caulobacter_spp,Caulobacter_crescentus_strain_CB15  1,1 5/5,93/93   NA1000  1   179/179
Chromosome  1901511 .   T   C   1   0   0   0.1 1   0   0   0.1 1;1 1   Caulobacter_spp 1   1/1 Caulobacter_crescentus_strain_CB15,NA1000   1,1 51/51,63/63

So this is not an issue that needs fixed. I will look over the dynamic group creation notes and see if there's anything that needs to be added to note about this.