nlapier2 / Metalign

Metalign: efficient alignment-based metagenomic profiling via containment min hash
MIT License
32 stars 7 forks source link

KMC fasta vs multi-fasta error #24

Open dkoslicki opened 4 years ago

dkoslicki commented 4 years ago

It appears that KMC will complain if you try to feed it a multi-fasta file instead of a fasta file. Eg. multi-fasta:

>seq1
CGATCATGCATG
ACGTACTGCTGA
>seq2
AGCTCAGTCAGT
ACGCGTACGATG

fasta:

>seq1
CGATCATGCATGACGTACTGCTGA
>seq2
AGCTCAGTCAGTACGCGTACGATG

solution: try to detect this somehow and pass the right arguments to the main script here and add -fm here.

dkoslicki commented 4 years ago

Alternatively, auto-convert with something like:

awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);}  END {printf("\n");}' < multi_fasta_in.fna | tail -n +2 > multi_fasta_out.fna

But then we'd lose auto-gunzip with KMC