zjshi / Maast

Microbial agile accurate SNP Typer
MIT License
29 stars 2 forks source link

Small example dataset for testing #3

Closed snayfach closed 2 years ago

snayfach commented 2 years ago

Could you provide a download link to a small dataset that could be used for testing? Maybe a tarball of 100 E. coli genomes, or something like that. That would make it really easy to test the code.

zjshi commented 2 years ago

Sure, hopefully this smaller set helps: https://fileshare.czbiohub.org/s/EFLBW86Ara7xZtn

snayfach commented 2 years ago

Thanks! let me know when you're ready for the next round of testing.

On Fri, Apr 8, 2022 at 1:18 PM Zhou (Jason) Shi @.***> wrote:

Sure, hopefully this smaller set helps: https://fileshare.czbiohub.org/s/EFLBW86Ara7xZtn

— Reply to this email directly, view it on GitHub https://github.com/zjshi/Maast/issues/3#issuecomment-1093317894, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQBXLIPVBLVEBWQCYFS6MTVECIARANCNFSM5RSO3GUA . You are receiving this because you authored the thread.Message ID: @.***>

zjshi commented 2 years ago

Yes, let's kick the next round off.

snayfach commented 2 years ago

$ Maast/maast end_to_end --in-dir ./101346 --out-dir ./101346_out

skip ./101346/OM08-11_1.fastq.gz: not fasta format skip ./101346/AM31-23_1.fastq.gz: not fasta format skip ./101346/AF25-38AC_1.fastq.gz: not fasta format skip ./101346/TM07-1_1.fastq.gz: not fasta format skip ./101346/AM23-11_1.fastq.gz: not fasta format skip ./101346/TF09-22_1.fastq.gz: not fasta format skip ./101346/AF36-11BH_1.fastq.gz: not fasta format skip ./101346/AF21-53_1.fastq.gz: not fasta format skip ./101346/OM08-11_1.fastq.gz: not fasta format skip ./101346/AM31-23_1.fastq.gz: not fasta format skip ./101346/AF25-38AC_1.fastq.gz: not fasta format skip ./101346/TM07-1_1.fastq.gz: not fasta format skip ./101346/AM23-11_1.fastq.gz: not fasta format skip ./101346/TF09-22_1.fastq.gz: not fasta format skip ./101346/AF36-11BH_1.fastq.gz: not fasta format skip ./101346/AF21-53_1.fastq.gz: not fasta format

snayfach commented 2 years ago

Also I think 300 genomes is a bit too many for a quick test. As a user I'd like the program to run in < 1 minute , just so I can see how it works, what the output looks like, etc.

snayfach commented 2 years ago

Maybe you could have two examples, one for genome-based and one for reads-based genotyping? It's a little confusing to include both kinds of data in the same folder

zjshi commented 2 years ago

Hmm... https://fileshare.czbiohub.org/s/EFLBW86Ara7xZtn after download has a name of 101346_small.tar.gz and only have 150 fastas and 3 fastqs. Could you confirm it?

Yeah, supplying two data type in the same folder is a feature. It does not matter to Maast whether the input folder contain only genomes, reads or both. I edited README to clarify it.

zjshi commented 2 years ago

Anyway, try this one instead: https://fileshare.czbiohub.org/s/tWdDA8BLrpGNEQY, thanks.