mcveanlab / mccortex

De novo genome assembly and multisample variant calling
https://github.com/mcveanlab/mccortex/wiki
MIT License
113 stars 25 forks source link

Add FAQ #44

Closed yannickwurm closed 7 years ago

yannickwurm commented 7 years ago

Hello, does current McCortex perform more reliably than stable cortex release? Or might there be major undiscovered problems? Thanks, Yannick

noporpoise commented 7 years ago

We're not aware of any issues with McCortex.

McCortex has several advantages over Cortex:

McCortex links are useful for calling large events and have been used to find ~10,00bp indels in bacteria samples. They do not help when calling SNPs and have a very high memory requirement for mammalian genomes.

The breakpoint caller is better suited to large events and the bubble caller to smaller ones.

On a bacterial dataset, we found McCortex (without links) is only slightly more sensitive than Cortex (+3% SNPs, +25% indels). Parameters such as kmer size, mapping quality cutoff and bubble/breakpoint caller choices are equally important. We'll hopefully publish some comparisons shortly.

yannickwurm commented 7 years ago

That sounds awesome - excellent job with Travis. Does this also include integration tests (simulated reads to expected VCF file)? Cheers!

noporpoise commented 7 years ago

Yes. make test runs the unit tests (pass MAXK=X to run for different max-kmer values). cd tests; ./run runs the integration tests. Both are run on Travis.

tests/pipeline is one such integration test that uses two samples + ref (600bp genome), simulates reads with sequencing error and runs the McCortex pipeline. If the VCF output doesn't match the expected VCF the test fails.

These tests catch major regressions, but won't catch small changes in sensitivity / specificity. There are lots of aspects of McCortex that need to be tested and all the tests have to run within the Travis CI time limit (we're using the free tier!). Tests also have to use low memory. Therefore most tests use small genomes (<1kbp).