peterjc / thapbi-pict

Tree Health and Plant Biosecurity Initiative - Phytophthora ITS1 Classifier Tool
https://thapbi-pict.readthedocs.io/
MIT License
8 stars 2 forks source link

Discussion for soil_nematodes example #539

Closed peterjc closed 1 year ago

peterjc commented 1 year ago

Originally added in #347 prior to multi-marker support being added, this worked example is lacking a discussion of the classifier assessment.

https://thapbi-pict.readthedocs.io/en/latest/examples/soil_nematodes/index.html

$ for F in summary/*.assess.onebp.tsv ; do echo; echo $F; cut -f 1-5,9,11 $F | head -n 2 | tsv ; done

summary/D3Af-D3Br.assess.onebp.tsv
#Species TP FP FN  TN  F1   Ad-hoc-loss
OVERALL  29 11 247 449 0.18 0.899

summary/JB3-JB5GED.assess.onebp.tsv
#Species TP FP FN  TN  F1   Ad-hoc-loss
OVERALL  3  0  273 108 0.02 0.989

summary/NF1-18Sr2b.assess.onebp.tsv
#Species TP FP FN  TN  F1   Ad-hoc-loss
OVERALL  38 51 238 953 0.21 0.884

summary/SSUF04-SSUR22.assess.onebp.tsv
#Species TP FP FN  TN  F1   Ad-hoc-loss
OVERALL  30 4  246 168 0.19 0.893

summary/pooled.assess.onebp.tsv
#Species TP  FP FN  TN   F1   Ad-hoc-loss
OVERALL  100 66 176 1162 0.45 0.708

Note currently we assess the controls against the same list of 23 species for all markers.

Appears the JB3-JB5GED marker is too narrow to cover all 23 species, and NF1-18Sr2b in particular has a false positive problem (apparently struggles at species level within Globodera, Steinernema, and Xiphinema).

peterjc commented 1 year ago

Note this example sequences one marker at a time.

Cross reference #425, we have no good way to specify expected species lists per sample AND per marker.

Given the current code, the example would need to call assess for each marker (with different marker files setup), and probably assess the pooled results separately again.

peterjc commented 1 year ago

Why are we lacking any reference sequences for Laimaphelenchus penardi? The authors report recovering it for NF1-18Sr2b and D3Af-D3Br with an NCBI RefSeq reference sequence available.

Update: Can use EU306346.1 and AY593918.1 Laimaphelenchus penardi for NF1-18Sr2b (but gets no matches).

Authors say in S3 table they used: Laimaphelenchus KX580741.1, KX580740.1, KF881746.1 - these have the D3FA left primer, but not the D3BR right primer.

Update: Lowering threshold, get ASV with 227 copies matching KF998578.1 Laimaphelenchus deconincki in D3Af-D3Br, and ASV with 123 copies matching EU306346.1 Laimaphelenchus penardi in NF1-18Sr2b