Closed mbarkdull closed 1 year ago
Dear @mbarkdull,
Looking at the sample input you attached here, 4 of the 5 sequences in the fasta file are identical and a large percentage of the sequence is an N
-- those factors may be throwing off the script. Do you see similar patterns in the other input files that fail? If so, you may want to add a filter step to remove files where the sequence is comprised of more than 50% N
s (or some appropriate cut-off).
@spond may have a more technical explanation, but that is where I would start.
Best,
Dear @mbarkdull,
HyPhy incorrectly auto-detects this file as containing amino-acids (since N
is a valid A/A). Unfortunately, there is no "smooth" workarounds. See https://github.com/veg/hyphy/issues/1574
One option (ugly, but should work) is to just add the following text at the very top (line 1) of the offending file.
BASESET :"ACGT"
Another option is to strip out N
prior to calling HyPhy.
Best, Sergei
Dear @jzehr and @spond,
Thank you so much for your quick replies. It does appear that all of the failures are caused by a high proportions of Ns in the sequences, so I'll explore those workarounds.
Very best, Megan
Good afternoon,
I am trying to remove stop codons from a few thousand input files, using essentially the following code:
hyphy /programs/hyphy-2.5.49/share/hyphy/TemplateBatchFiles/CleanStopCodons.bf Universal ./removedStops/cleaned_OG0014256_cdsSequences.fasta No/Yes test.fasta
.CleanStopCodons.bf works on the majority of inputs, but for a minority, it fails with the error:
I'm attaching a sample input so you can replicate the error.
If you could help me understand what the issue is, I would really appreciate it. If there is no way to run CleanStopCodons.bf on these inputs, that's fine, but I want to make sure I'm not making an obvious mistake.
Thank you so much!
cleaned_OG0014256_cdsSequences.txt