Closed ramadatta closed 6 years ago
Did the file originate from a Mac or Windows computer?
it could be wrong "newline" endings.
What OS are you on?
If on Mac or LInux, run dos2unix polymorphic_sites.fasta
and mac2unix polymorphic_sites.fasta
on them and see if it fixes it?
I tried the other direction unix2dos
and it didn't cause any problems.
cat foo.fa && snp-dists -b -c foo.fa
>S1
ATGC
ATGC
>S2
ATGC
ATGC
This is snp-dists 0.6
Read 2 sequences of length 8
,S1,S2
S1,0,0
S2,0,0
Email me the file if you like and I will try it out.
You can use od -a polymorphic_sites.fasta
to inspect it at a character level.
Hi Seemann,
Thank you. I have generated the file in Linux only but could not trace the problem. Newlines seems to be correctly placed and may not be cause for this error.
Please find the file for your reference. Thanks so much!
Your sequences are not the same (but they are the same length).
I put each sequence into a file called 1
and 2
Here is the first difference:
% cmp 1 2
1 2 differ: byte 8100, line 1
$ cut -c 8099-8101 1
GNN
$ cut -c 8099-8101 2
GGC
There are many more. They don't have the same distribution of letters:
Reference_CP028169.fasta.ref dna 100566 |N 118 0.1% |A 22048 21.9% |T 22381 22.3% |G 27497 27.3% |C 28361 28.2%
CP028169_Duplicate.fasta dna 100566 |N 72 0.1% |A 22063 21.9% |T 22383 22.3% |G 27518 27.4% |C 28369 28.2%
You only get a distance of 1 because -a
wasn't used. If it is enabled, there are 47 differences.
Hi Seemann,
I have run gubbins with two exact same sequences and fed the "polymorphic_sites.fasta" to snp-dists program. I expect no SNP difference between these two sequences since they are exactly same sequences. But I keep on getting 1 SNP difference. Is it a bug or Am I missing something here.
Thanks.
My output looks like this: