Open JennyHTLee opened 5 years ago
Hi,
It looks like you have some duplicate IDs in your genome file perhaps. We can check that with the commands below:
grep ">" genome.fas | sed 's/>//' | sort -u | wc -l
and
grep -c ">" genome.fas
If the output of those are different there may be duplicates or some other issue. Seeing the ID format would be helpful too.
grep ">" genome.fas | head
Thanks.
Thanks for your reply,
There seems to be no duplicated IDs, I am not sure if ID is the real issue because the shortened/edited IDs do not solve the problem. It is also probably not one particular sequence causing this, as the same error was obtained using both the full set/subset.
What could be other possibilities? The run halted when the IDs were listed to the gff, there are 809 IDs in total and it stopped at 631.
Best regards, Jenny
Hi Jenny,
It appears there is something odd with the IDs or sequences, and I thought it might be caused by having duplicate IDs based on the message. Though, that is not the case so it must be something else. It could also be something with the code but it is hard to say.
Can you share the file with me? I'd like to test it myself because that may be faster than trying to propose solutions from a distance.
Thanks, Evan
Hi Evan,
Sure, I've shared the file "genome.fasta.gz" through fex via your email at evanstaton.com
Thanks for your help!
Best regards, Jenny
Just FYI, I can recreate the error. This is something in the GenomeTools library and not Tephra, so it's not immediately clear how to resolve it. I will likely have to reduce the error to the problematic sequence and raise the issue to that group, but I will keep this issue updated as I find out more.
Thanks, Evan
Hello,
I run tephra all and obtain an error at the findtirs step:
When running findtirs alone:
The other steps look fine based on the output files. This is the log:
Tephra docker version was used:
Thanks for your help
Regards, Jenny