Closed sujaikumar closed 1 year ago
for reduced busco set, it should be enough to just use the smallest sets (eukaryota, bacteria and archaea, with the placement file filtered accordingly). The reduced sets in the blast databases would ideally correspond to a couple of very small naturally (or artificially) contaminated assemblies/read sets in the same directory so the unit tests can be run with < 1Mb sequence data.
@sujaikumar Can you either add via PR or here the instructions for creating the databases? Need to include this for v1 release. Thank you.
For unit testing the pipeline, create taxon restricted subsets of
Put these in /lustre/scratch123/tol/resources, synchronise with s3 (so others can access them too)
Create instructions for re-creating the FULL databases and the sub-sampled databases