Closed johnlees closed 2 years ago
By default fastbaps excludes SNPs that only appear in a single genome https://github.com/gtonkinhill/fastbaps/blob/master/R/import_fasta_sparse_nt.R#L80-L83.
I think this error is being thrown when there is <= 1 remaining SNP that is shared by more than one genome. The code works if you duplicate the SNP in your first example
>17059_6#10
AGCACC
>17059_6#22
AGCTGG
>15995_1#182
AGCTGG
>15995_1#150
TATTCC
>15995_1#114
AGCTCC
>17059_6#81
AGCTCC
It it probably best not to continue with the algorithm if only a single SNP remains. Is there a particular error message that would work better for your pipeline?
I think this error is being thrown when there is <= 1 remaining SNP that is shared by more than one genome. The code works if you duplicate the SNP in your first example
I thought that at first too, but the following also errors:
>17059_6#10
AGCACAAC
>17059_6#22
AGCTGAAC
>15995_1#182
AGCTGTTG
>15995_1#150
TATTCTTG
>15995_1#114
AGCTCTTG
>17059_6#81
AGCTCTTG
So I think it's something slightly different
It it probably best not to continue with the algorithm if only a single SNP remains. Is there a particular error message that would work better for your pipeline?
Obviously sensible. Just anything along the lines of 'not enough variants' would do!
It looks like there was a bug when loading from fasta files rather than DNAbin objects.
The initial sequence was not counted in the definition of the consensus variant. This led to problems in very small alignments when a variant was only present twice and one of these examples was in the first sequence.
I've pushed a fix for this and added a better error message. Once the automatic checks have passed I'll create a new release.
Thanks for pointing this out!
awesome, thanks for the quick fix!
Running this as part of a pipeline, we end up with a tiny alignment:
If I save this as
small_fb.aln
and import withsparse.data <- import_fasta_sparse_nt("~/Downloads/small_fb.aln")
I get the following error:backtrace is
which I think is from here: https://github.com/gtonkinhill/fastbaps/blob/master/R/import_fasta_sparse_nt.R#L87-L91
At first I thought this was because we only have one non-singleton column, but adding more sites didn't immediately help. I haven't read through the code properly, but I am guessing this is because there needs to be at least one observation of the four bases in passing sites. When I get to every base being represented in non-singleton sites it works ok:
For these cases, is it still possible to set the prior and keep running? If not, would it be possible to give a more informative error message?
(Sorry this is a bit of an edge case, it's just hard to work around from the error message we currently get)