sanger-tol / blobtoolkit

Nextflow pipeline for BlobToolKit for Sanger ToL production suite
https://pipelines.tol.sanger.ac.uk/blobtoolkit
MIT License
11 stars 1 forks source link

Unit testing - sample databases #14

Closed sujaikumar closed 1 year ago

sujaikumar commented 2 years ago

For unit testing the pipeline, create taxon restricted subsets of

Put these in /lustre/scratch123/tol/resources, synchronise with s3 (so others can access them too)

Create instructions for re-creating the FULL databases and the sub-sampled databases

rjchallis commented 2 years ago

for reduced busco set, it should be enough to just use the smallest sets (eukaryota, bacteria and archaea, with the placement file filtered accordingly). The reduced sets in the blast databases would ideally correspond to a couple of very small naturally (or artificially) contaminated assemblies/read sets in the same directory so the unit tests can be run with < 1Mb sequence data.

priyanka-surana commented 1 year ago

@sujaikumar Can you either add via PR or here the instructions for creating the databases? Need to include this for v1 release. Thank you.