Open esteinig opened 4 years ago
I will make a small script for this in the nanopath
package, so we can sample from host and pathogen fastq
files.
This will also be useful to assess the human decontamination on the client side.
Added compose CLI:
Usage: nanopath utils compose [OPTIONS]
Compose artificial mixtures by sampling from read files
Options:
-c, --composition PATH JSON file, composition configuration
-o, --output PATH Output reads file path
-r, --reads INTEGER Total reads to sample for the mixture
-s, --shuffle Shuffle output reads
--help Show this message and exit.
Example JSON file @ test/data/compose.json
{
"human": {
"file": "/data/nanopath/test/human.fq",
"proportion": 0.90
},
"saureus": {
"file": "/data/nanopath/test/saureus.fq",
"proportion": 0.05
},
"kpneumoniae": {
"file": "/data/nanopath/test/kpneumoniae.fq",
"proportion": 0.05
}
}
Create a data set for testing:
Basecalled with different models, for real validation we need to rebasecall with same model. This validation data set is meant to be used for on-the-fly testing instead.
ArtificialMixture
class in nanopath/utils.py
needs better logging.
Shuffled test data set at test/data/compare.fq
Added test Kraken2
output from test/data/compose.fq
in f70bd6cee1bac97aabd4792e59f9629256e25133:
test/data/test.report
test/data/test.reads
Reminder to rebasecall all with standardized HAC models
We should think about validation data sets, maybe just mix them up synthetically? We can get human
Fast5
from the reference genome consortium and we could inject a bunch of different pathogen reads at predetermined proportions into the reads. Might need to rebasecall all to standardize.Maybe at the moment some MRSA and Klebsiella in human will suffice.