Closed eyalbenda closed 4 years ago
Strange indeed. To fix the underlaying issue, it would better to open an issue in PASA -- as its possible adding -p
option to the relevant fasta36
calls in PASA -- ie then presumably it won't try to predict the alphabet (DNA vs protein).
For an immediate work around you can use funannotate fix
to drop that model causing issues, then re-run funannotate update
, and then you'll have to manually add that gene model back via the NCBI tbl format and then run funanntoate fix
again with the updated tbl file.
echo "FUN_000647" > model_drop.txt
funannotate fix -d model_drop.txt -i outfolder/predict_results/genome.gbk \
-t outfielder/predict_results/genome.tbl
And then re-run update:
funannotate update -i outfielder
Then add back in that model from the original tbl format -- funannotate fix
will generate an "archive" folder housing the original results, you can just copy/paste that gene model back into the new output from update. And then run fix script on the update_results files.
Thank you for the help. It made me notice that fuannotate was messing up the species name. I think it could be because the species, "Dunaliella bardawil", isn't in ncbi, while a sister species, "Dunaliella salina", is. The gbk file has Dunaliella salina everywhere, and after running fuannanotate fix the files get renamed to that species. I guess I can use sed to change it back, but it appears to be a bug. I specified the full species name, with the quote marks, to both the train and predict commands using the -s flag. I'm rerunning the update command for now and will update further.
Update: the fix allowed funannotate update to run successfully to finish. Please let me know if I should close the bug report or keep it open due to the naming issue.
Hmm, not sure exactly about the taxonomy issue -- the pipeline is running tbl2asn
using taxonomy lookup, so I thought this would only grab the existing lineage and not necessarily change the genus species name. I'll have to look into the tbl2asn
docs to see if this is their intended behavior or not.
Per the error above with the GT rich gene -- this should be addressed in the PASA code as it really isn't a funannotate bug.
D bardwelli is a synonym in ncbi. That’s why the names are switched
Jason Stajich, PhD jasonstajich.phd@gmail.com On Dec 23, 2019, 11:03 AM -0800, Jon Palmer notifications@github.com, wrote:
Hmm, not sure exactly about the taxonomy issue -- the pipeline is running tbl2asn using taxonomy lookup, so I thought this would only grab the existing lineage and not necessarily change the genus species name. I'll have to look into the tbl2asn docs to see if this is their intended behavior or not. Per the error above with the GT rich gene -- this should be addressed in the PASA code as it really isn't a funannotate bug. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
I see. Thanks for the reply, I understand now. Happy holidays and thank you as always for the support!
Are you using the latest release? yes
Describe the bug I have what has to be a very exotic error, caused by a protein that has a long stretch of GT (Gly-Thr) repeats, and little else. This is the predicted protein:
The pasa step of funannotate update fails, and the pasa log points to a failed call to Pearson's fasta program (see below). I believe this error is very exotic - for reference, NCBI's blast refuses to search this protein, since it recognizes it as a DNA sequence instead of protein. I believe the best solution, in this case, would be to somehow remove the protein from predict_results and reinsert it to update_results manually. Is this possible?
Logfiles Log of the failed call to Pearson's fasta program
seq1:
seq2:
OS/Install Information Funannotate 1.7.1, installed with conda