iqbal-lab / cortex

reference free variant assembly
32 stars 13 forks source link

Replace stampy with minimap2 #27

Closed bricoletc closed 3 years ago

bricoletc commented 3 years ago

@iqbal-lab could you give me access to the cortex repo? so I could in future push code to a minimap2 branch for eg. In the meantime this is for reviewing my changes.

Code

Checks

I have validated that I get the same VCF when running py_cortex_api (it wraps cortex's independent workflow) with stampy and with minimap2, on cortex's demo files example1

TODOs

bricoletc commented 3 years ago

Logging here a validation evaluation of using minimap2. Evaluation uses varifier on one staph dataset from martin (with truth genome assembled using pilon), :

I found that the 'good' minimap2 settings (-k9 -w4) give following recall/precision:
  * require mapq > 40: "Precision_edit_dist": 0.99996956, "Recall_edit_dist": 0.50726812
  * require mapq > 0:  "Precision_edit_dist": 0.99998191, "Recall_edit_dist": 0.83881474,
  * previous cortex, using stampy and mapq > 40, gets "Precision_edit_dist": 0.98863584, "Recall_edit_dist": 0.83029847,
on this dataset, minimap2 with mapq>0 does a little better in both metrics.
bricoletc commented 3 years ago

Logging some more validation work using 14 ilmn Plasmodium falciparum samples with matches pacb assemblies (in previous comment, ran on a Staphylococcus aureus dataset):

stampy: recall: 0.4490 precision: 0.9118
minimap2: recall: 0.4443 precision: 0.9258

These are the average of metrics computed using varifier. THis confirms the change works fine.

iqbal-lab commented 3 years ago

hurrah