guma44 / ushuffle

A useful tool for shuffling biological sequences while preserving the k-let counts
13 stars 5 forks source link

Fasta input and output options #8

Open ppgardne opened 3 years ago

ppgardne commented 3 years ago

Support for fasta input and output options would be greatly appreciated!

guma44 commented 2 years ago

Hi, could you elaborate of what you mean exactly?

ppgardne commented 2 years ago

Sure. As I understand it, ushuffle takes a biological sequence input from the command line, and the output is a single line for each permuted sequence. For large sequences this can be quite cumbersome, and does not feed into many other sequence analysis tools. Similar tools (e.g. esl-shuffle, shuffleseq) support fasta format for input and output.

I.e. the input sequence is read from a file with a -f <filename> or similar option, and output is written as e.g.:

>ushuffle1 <sequencename>
ACGTACGTACGCTATACG....
>ushufle2 <sequencename>
ACGATCGATCGTACGTA...
etc.
ppgardne commented 2 years ago

Just to illustrate an issue with the command-line option (I realise "ushuffle.c" isn't maintained by you), I get an "Argument list too long" error when trying to shuffle this modestly sized genome: https://www.ebi.ac.uk/ena/browser/view/AE001825