Add FAQ - Githubissues

yannickwurm commented 7 years ago

Hello, does current McCortex perform more reliably than stable cortex release? Or might there be major undiscovered problems? Thanks, Yannick

noporpoise commented 7 years ago

We're not aware of any issues with McCortex.

McCortex has several advantages over Cortex:

compiles on mac with clang or gcc
Makefile pipeline is simpler to run / interrupt / continue, and run parallel commands with e.g. make -j5
many of the commands are multithreaded e.g. -t 2
bubble calling with multiple samples uses much less memory
genotype samples on third party VCFs (i.e. on calls that couldn't be made with McCortex/Cortex)
low memory genotyping, using the uncleaned graph (reduce genotype mistakes)
uses bwa instead of stampy to map variants (much faster)
tests run against each commit with Travis CI, so you'll know when I've broken something
you don't have to install any dependencies

McCortex links are useful for calling large events and have been used to find ~10,00bp indels in bacteria samples. They do not help when calling SNPs and have a very high memory requirement for mammalian genomes.

The breakpoint caller is better suited to large events and the bubble caller to smaller ones.

On a bacterial dataset, we found McCortex (without links) is only slightly more sensitive than Cortex (+3% SNPs, +25% indels). Parameters such as kmer size, mapping quality cutoff and bubble/breakpoint caller choices are equally important. We'll hopefully publish some comparisons shortly.

yannickwurm commented 7 years ago

That sounds awesome - excellent job with Travis. Does this also include integration tests (simulated reads to expected VCF file)? Cheers!

noporpoise commented 7 years ago

Yes. make test runs the unit tests (pass MAXK=X to run for different max-kmer values). cd tests; ./run runs the integration tests. Both are run on Travis.

tests/pipeline is one such integration test that uses two samples + ref (600bp genome), simulates reads with sequencing error and runs the McCortex pipeline. If the VCF output doesn't match the expected VCF the test fails.

These tests catch major regressions, but won't catch small changes in sensitivity / specificity. There are lots of aspects of McCortex that need to be tested and all the tests have to run within the Travis CI time limit (we're using the free tier!). Tests also have to use low memory. Therefore most tests use small genomes (<1kbp).

mcveanlab / mccortex

Add FAQ #44