Open tavinathanson opened 8 years ago
Also related: tracking provenance. cohorts
tracks Python
package versions in PROVENANCE
files inside the cache
directory. I'm not sure how to pull out the software versions that epidisco
used?
@tavinathanson: definitely huge overlap between epidisco and cohorts. epidisco does neoepitope prediction, but the versioning conflicts you mentioned might become an issue.
When we were discussing this issue during the hacktathon, we just thought that it would be great to leave the heavy-lifting to epidisco and have the option to easily build a cohorts analysis on top of it. Re-mapped BAM files, HLA typing results and VCF files can all be consumed by cohorts if we simply mount the NFS that biokepi writes the results into; but I don't think the neoepitope predictions epidisco spits out will be that useful for the RCC project overall — given the iterative, explorative nature of these checkpoint studies.
When we were discussing this issue during the hacktathon, we just thought that it would be great to leave the heavy-lifting to epidisco and have the option to easily build a cohorts analysis on top of it. Re-mapped BAM files, HLA typing results and VCF files
I'd agree with this - get the large compute portions from epidisco
(alignment, VCF, etc) and use those as inputs and let cohorts
continue as-is to create the effects and annotations. This will make it easier to re-rerun effect annotation when bugs arise, or re-run isovar
with different parameter settings etc.
Probably a separate issue to biokepi will be how to handle dependencies not explicitly specified. An example is, vaxrank
version was recently bumped to 0.2.5
, which requires varcode>=0.5.1
. However, varcode=0.5.8
has a particular bug fix in it, what's the best way to specify this?
Certainly seems like that's the easiest thing for now: namely using epidisco
for everything that cohorts
doesn't do.
Recent thoughts from @hammer (correct me if I'm paraphrasing incorrectly): epidisco
for anything often generated, cohorts
for anything exploratory.
Our current strategy doesn't quite fall into that description, since we're currently doing e.g. neoantigen calling in cohorts
. Our current reasoning for doing that in cohorts
is to be able to look at various intermediates that I don't believe are easily accessible from epidisco
in its current form.
epidisco
for anything automatically generated
From @hammer:
Cohorts and Epidisco share some functionality. It would be nice to centralize the discussion of how we'll ensure they generate consistent results and ultimately separate concerns.
Currently my biggest concern is that Epidisco uses vaxrank but cohorts uses topiary. hammerlab/vaxrank#31 might solve this issue.
From @tavinathanson:
From @tavinathanson on some other thread:
From @tavinathanson:
From @jburos:
From @tavinathanson: