Open drpatelh opened 2 years ago
It would be nice to host the uncompressed data on S3 somewhere so we can use links to individual reference/input files. One option would have been to upload them to Github like we do on nf-core but the FastQ files are too large (~100M).
Agreed that the Sentieon Quickstart package is too large for this case.
Maybe we can leverage test datasets used for the Sarek (https://github.com/nf-core/test-datasets/tree/sarek) pipeline? In particular, the Sarek dataset contains trimmed dbSNP, Mills, and known indel VCFs along with small fastq files.
Description of feature
We are currently using a minimal test dataset for SARS-CoV-2 which is sufficient to test the pipeline but we don't have dbSNP and indel files for this reference.
It would be good to have an additional
-profile test_germline
for test data created by Sentieon as part of their Quick start docs. This is a small dataset for germline variant calling from part of NA12878/HG001.