Closed leoisl closed 1 year ago
What small fixes need to be done?
Wasn't it the changelog and examples
yes, + precompiled binaries, conda recipe update, and the release itself. Conda might auto update seeing we have a new release, but I am not sure the auto update will work. I would actually prefer not to have this version on conda, as it is not stable enough yet I guess
Conda won't trigger on pre-releases.
Cool!
I've found an issue when running with the tiny example we have for this release. Not an issue with the changes in this release I believe, so I think we can proceed with the release. Description of the issue follows:
The tiny example describes new alleles that pandora denovo
should find, and one sample should genotype to one allele, and the other sample to the new allele. For the example gene GC00010897
, the new allele is a T
in position 44, and this is what we should find.
GC00010897 44 . C T . . SVTYPE=SNP;GRAPHTYPE=SIMPLE GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF 1:0,11:0,16:0,11:0,16:0,23:0,32:1,0:-220.34,-8.03511:212.304 0:22,0:18,0:22,0:18,0:44,0:37,0:0,1:-2.87264,-270.207:267.334
New pandora
version finds this:
GC00010897 1 . ATGCAGATACGTGAACAGGGCCGCAAAATTCAGTGCATCCGCACCGTGTACGACAAGGCCATTGGCCGGGGTCGGCAGACGGTCATTGCCACACTGGCCCGCTATACGAC ATGCAGATACGTGAACAGGGCCGCAAAATTCAGTGCATCCGCATCGTGTACGACAAGGCCATTGGCCGGGGTCGGCAGACGGTCATTGCCACACTGGCCCGCTATACGAC . . VC=PH_SNPs;GRAPHTYPE=SIMPLE GT:MEAN_FWD_COVG:MEAN_REV_COVG:MED_FWD_COVG:MED_REV_COVG:SUM_FWD_COVG:SUM_REV_COVG:GAPS:LIKELIHOOD:GT_CONF 1:19,20:21,23:26,26:28,27:331,369:372,422:0.235294,0.111111:-221.046,-198.019:23.0274 0:30,26:27,23:32,32:30,29:515,471:465,428:0,0.166667:-228.672,-276.438:47.766
Notes:
C -> T
at pos 44, but the new version outputs this as a 110-bp PH_SNP
, where in truth is actually just a single SNP. So the new version is not being as compact as it should;For some reason, on the previous version, we just have a 1-bp bubble describing the SNP to be genotyped in the PRG string, flanked by long linear sequences:
ATGCAGATACGTGAACAGGGCCGCAAAATTCAGTGCATCCGCA 5 C 6 T 5 CGTGTACGACAAGGCCATTGGCCGGGGTCGGCAGACGGTCATTGCCACACTGGCCCGCTATACGAC
In the new version, we have a long 110-bp bubble:
5 TTeGAGTAAAACAATCCCCCGCGCTTATATAAGCGCGTTGATATTTTTAATTATTAACAAGCAACATCATGCTAATACAGACATACAAGGAGATCATCTCTCTTTGCCTGTTTTTTATTATTTCAGGAGTGTAAACACATTTTCCG 6 TTGAGTAAAACAATCCCCCGCGCTTATATAAGCGCGTTGATATTTTTAGTTATTAACAAGCAACATCATGCTAATACAGACATACAAGGAGATCATCTCTCTTTGCCTGTTTTTTATTATTTCAGGAGTGTAAACACATTTTCCG 5
make_prg
versions haven't changed (AFAIK), and the discovered pandora denovo alleles for this gene are exactly the same. Although the graphs are conceptually the same, i.e. represent the same information, one allows for a very compact description of the variants, while the other not. However, coupled with an issue on calculating likelihoods of PH_SNPs
in pandora
, we could actually get different results. From this execution, there are two tasks we should carry out:
make_prg
and understand why it created different graphs from presumably same input. From make_prg
perspective, however, this is not a bug - both graphs represent the same information. However, we can miss variants due to this different representation of the PRG and the issue described in (1).I believe this is not an issue to drprg though, as it does not use make_prg update
.
Did I miss some task that we should do?
Do you agree we can release even with this known issue?
Which would be the path you'd recommend to solve or investigate this?
This has to be a make prg issue right? Given the discovered allele was a single base? Might be a bug/regression in update
? I don't really know where you v1.0.0 of make_prg is coming from though. I did see some strange allele additions back when I was using update
in drprg remember - but I'm using this version of make prg https://github.com/mbhall88/drprg/blob/533f7b36084dbb269d511e9146f5cadae021a7b1/justfile#L7. It surely can't be a pandora issue as pandora doesn't build PRGs...
This has to be a make prg issue right?
I think so...
I don't really know where you v1.0.0 of make_prg is coming from though.
Neither I at this point. I think from https://github.com/leoisl/make_prg/tree/update_1_0_0_pre_release , but I am not sure. Anyway both mine and your version are outdated. make_prg
is partially merged, we are working with 2 different versions that are not the most updated one.
This is too messy to manage and really an issue. As I de-prioritised make_prg
due to new projects with higher priorities (tbpore
, roundhound
, mof
and now pandora
), I am re-prioritising make_prg
now and just resuming other tasks once we have make_prg
merged and released. Otherwise this will become even messier and the cost to fix will be higher.
FYI my container refer to this commit: https://github.com/leoisl/make_prg/commit/b064dcdf4da72df3228cbc993018ac50da6105f0 . But this does not matter much, as we will be releasing new make_prg
version in the next couple of days, so we will all sync to use the new version
BTW for this https://github.com/rmcolq/pandora/issues/310#issuecomment-1322417386 In the second enumeration, item 1, where you talk about the SNP signal being diluted , that's why Brice has his fancy model for gramtools which handles this. Porting it to Pandora is on the road map somewhere.
I'd ignore the gcp issue , the real question is why a single snp is suddenly becoming a long thing
Done
Merged into main, but still need to do some small fixes and make the release