will-rowe / hulk

Histosketching Using Little Kmers
MIT License
55 stars 4 forks source link

Support for Fasta input #2

Closed standage closed 5 years ago

standage commented 6 years ago

Greetings!

I enjoyed reading the Hulk preprint, and I have successfully installed the conda package and am taking it for a test drive. Great work!

One of the first things I tried was to was sketch a reference genome assembly in gzip-compressed FASTA format. It doesn't appear that compression is an issue, but hulk won't sketch sequences in FASTA format. Is there any plan to support FASTA data in the future?

will-rowe commented 6 years ago

Thanks for giving it a go! You're right, FASTA isn't supported at the moment. I can definitely add support for it though - I've just been using this initial release to get an idea of how people would like to use this method and I'd be happy to add functionality.

Hopefully will get to this in a week or so.

will-rowe commented 5 years ago

I've added this in on the development branch (f055f903987b1d917445d20f906690dcfcecee59). Seems to be working okay but I've not tested it on any real world data yet.

You just need to add the --fasta flag to your sketch command and it will convert your .fna/.fasta input files into dummy fastq reads. A bit hacky but a quick implementation for you!

Will add this to the next HULK release, hopefully due later this week.