phytest-devs / phytest-temporal-signal-example

0 stars 0 forks source link

Cleaning of data #1

Open rhysinward opened 1 year ago

rhysinward commented 1 year ago

Within your pipeline, I'm curious how you selected sequences for pruning to ensure that your data met your prescribed requirements. Would it be possible to add or describe how you selected your sequences I.e going from raw data --> quality controlled data?

Thanks a lot!

P.S. there is a typo within the command - phytest ice_viruses_tests.py -a data/ice_viruses.fasta -t data/ice_viruses.fasta.treefile .

I believe the tag should be -s rather than - a

Wytamma commented 1 year ago

Hey @rhysinward thanks for finding the typo (https://github.com/phytest-devs/phytest-temporal-signal-example/commit/4e9308c039911a4f954e20a72328e566c66e750a)!

In this example we are following the TempEst tutorial in which the authors use a RTTR to find outlier samples. The use case for phytest would be in an automated pipeline where a manual RTTR wouldn't be done for every analysis. By setting the test values to what we expect we can ensure that the pipeline will fail if there are outliers. You would then go back to the samples and perform a formal investigation to identify the outliers (e.g. A.BrantGoose.1.1917 is lab contamination in this case).