A test data set that combines the use of as many features of the pipeline as possible and imitates data from a diagnostic clinic setting. Perhaps including the following:
Representative pathogenic species for each of the following taxa:
[ ] bacteria (2-5 species)
[ ] fungi (2-5 species)
[ ] nematodes (1-2 species)
[ ] insects (1-2 species)
Between 1 and 100 samples for each species using both long and short read sequencing. Most can be 1-5 samples, but it would be nice to have one species with 100ish samples.
Columns with fake data that imitate a diagnostic clinic setting:
[ ] date received
[ ] client
[ ] an ID representing a single "submission" of multiple samples. i.e. it would be the same ID for all samples submitted by a given client at one time
[ ] host taxon (any taxonomic level)
[ ] GPS coordinates
There should be reports for each client as well as a report containing all samples
A test data set that combines the use of as many features of the pipeline as possible and imitates data from a diagnostic clinic setting. Perhaps including the following:
Representative pathogenic species for each of the following taxa:
Between 1 and 100 samples for each species using both long and short read sequencing. Most can be 1-5 samples, but it would be nice to have one species with 100ish samples.
Columns with fake data that imitate a diagnostic clinic setting:
There should be reports for each client as well as a report containing all samples