Closed ZeweiSong closed 8 years ago
You need to have a ">" character in the beginning of each header line in the FASTA files.
The size annotation must be at the end of the header line. If you use the usearch-style abundance format (header lines ending with ";size=123;") you need to specifiy the "-z" option to Swarm. If you use the native abundance format (header lines ending with "_123") you must not have a semicolon at the end of the header line.
I did have the ">", GitHub treat > as comment that is why you can not see it. I've pasted it as code here.
>seqID1;size=1000;
ACTGTGACACGGGTGTGTGACACTGTGT
>seqID2;size=200;
ACGCTACTATCGATGCGATCGATGCTAG
Based on the error message you got there is probably some kind of illegal character in your input file. Try copying and pasting again (use paste text only or similar), perhaps from the text above. Based on the strange appearance of the error message it might be a stray carriage return character (ascii 13).
Also, you need to specify the "-o" option before the final file name on the command line if you want output to go there.
Thanks, I did fix it by pasting to a new file. Do you mean I have \r in my file?
Yes, it appears so based on the strange error message (' in sequence on line 2ror: Illegal character ').
It should probably be like this: "Error: Illegal character '\r' in sequence on line 2."
It seems like this problem is caused by files that have characters with ascii code 13 (CR, ^M) at the end of the lines. This is typical of files from DOS/Windows. These characters should be stripped. I'll reopen the issue.
This problem has been fixed in the new version 2.1.7. Thanks for reporting the problem.
I got this message when trying the example FASTA file:
./swarm -t 4 -f -w myfile.fasta test.fasta myfile.swarm Swarm 2.1.6 [Dec 14 2015 10:59:14] Copyright (C) 2012-2015 Torbjorn Rognes and Frederic Mahe https://github.com/torognes/swarm
Please cite: Mahe F, Rognes T, Quince C, de Vargas C, Dunthorn M (2014) Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2:e593 https://dx.doi.org/10.7717/peerj.593
CPU features: mmx sse sse2 sse3 ssse3 sse4.1 sse4.2 popcnt avx Database file: test.fasta Output file: (stdout) Resolution (d): 1 Threads: 4 Scores: match: 5, mismatch: -4 Gap penalties: opening: 12, extension: 4 Converted costs: mismatch: 9, gap opening: 12, gap extension: 7 Break OTUs: Yes Fastidious: Yes, with boundary 3
' in sequence on line 2ror: Illegal character '
I just copied and paste what is in the example and save it in a .fa file:
It also doesn't work when I tried to used the USEARCH style size label:
But, it did work when I feed in the uchime_denovo output from vsearch, which actually have the size annotation on the second line:
Any idea?