Closed apredeus closed 4 years ago
Hi,
did you figure out this solution to this? Thank you so much!
Hi. Not really - but I experimented with different formats of the annotation, and figured out the way to reduce the warnings to a minumum. Most of them were due to the presence of pseudogenes (stop codon in the middle of a feature) or ncRNAs that were not annotated as such. Once you fix those, there are virtually no warnings.
Hi,
thanks a lot for your prompt reply! Sorry if this question is too naive. Is there someway to pull out the genes with warning from the snpEff process or we have to search for pseudogenes/ncRNAs in the GFF file, remove them in the GFF files and build the database again? Would you happen to have some scripts to share? Thank you so much!
Best, M
Sorry, I don't have anything specific - usually these are makeshift commands I use. You can use bedtools to extract gene sequences in nucleotide form, and then convert it them to predicted proteins using EMBOSS transeq (both bedtools and emboss can be easily installed using bioconda).
After this, just look for genes with stop codons in the wrong place, without the stop in the end, etc.
Hope this helps!
got it! thanks a lot! have a good day! 😄
Closing old issues.
Hello,
I am using SnpEff for custom comparison of our in-house bacterial strains, and it's working very well so far. Making DBs is easy and smooth, and virtually all the info I need is generated in minutes. However, there are some warnings when I create the database, which I would imagine mean problematic assembly or annotation. I would like to look at them in more detail. Is it possible to print something more than a summary in the log? And also, what are "length errors", "stop codon warnings", etc? I'm talking about this type of warning summary:
Thank you in advance!