Closed mr-c closed 4 years ago
Please have a look at those dataset:
ONT:
Escherichia coli 1.2G
SRR11475550 (https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR11475550)
Caenorhabditis elegans 3.3G
SRR11456709(https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR11456709)
PACBIO:
Escherichia coli PacBio RS II 1.3G
SRR8494908 (https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR8494908)
Thank you @ruanjue for the response. I'm happy to personally validate using data of that size, but for Debian we can't use such large data. Is there a smaller dataset that you recommend?
For both the original recommendation and the smaller dataset, how exactly should we invoke the tools and is there a specific expected result?
This resource may be useful to you: https://bssw.io/items/what-is-cse-software-testing
I am afraid I was giving the smallest dataset I know. Otherwise, we should mimic test data.
Okay. How should we use this data to test the wtdbg2 tools?
Lets talk about SRR8494908.
#download rawdata
prefetch SRR8494908
#converting
fastq-dump --fasta --gzip SRR8494908.sra
#assembling
wtdbg2.pl -o ecoli -t 8 -x rs -g 4.5m SRR8494908.fasta.gz
#the final contigs
ls ecoli.cns.fa
prefetch
and fastq-dump
can be found within https://github.com/ncbi/sra-tools.
#the final contigs ls ecoli.cns.fa
Should ecoli.cns.fa
have a specific size or md5sum
checksum?
The content of result file may be various.
How could we test the contents of the file to determine if wtdbg2.pl is functioning correctly?
Aligning them against reference ecoli genome. I think it is complex to auto test the correctness, the best way is skip to check that.
Hmm.. The entire point is to have a test that we can run automatically. I don't think a reference genome alignment is unreasonable.
If you had unit tests for parts of functionality, that would be an acceptable alternative.
A easy way is to check the file size, ·ecoli.cns.fa· is about 4.6MB, lets set a range for it, 4.4 ~ 4.8 MB.
Hello!
wtdbg2
is packaged for Debian https://tracker.debian.org/pkg/wtdbg2We'd love to run some tests, can you include some in the repo or point us to freely licenced test data?
Thanks!