usadellab / prot-scriber

Assigns short human readable descriptions to biological sequences or gene families using references. For this, prot-scriber consumes sequence similarity search results in tabular format (Blast or Diamond).
GNU General Public License v3.0
5 stars 5 forks source link

Annotation typo #47

Closed Hannah-Doerpholz closed 1 month ago

Hannah-Doerpholz commented 8 months ago

I believe I found another type in one of your annotations. I used the CDS for the barley gene horvu.morex.r3.2hg0178420. For this I receive the annotation "dnaj subfamily b memer". I think it should say "member" in the end. I have attached the sequence file that yielded this result.

HORVU.MOREX.r3.4HG0178420.txt

derRiesenOtter commented 8 months ago

I tried to replicate this error. Sadly, I wasn't able to. I only got blast hits using the trembl databank (no hits with sprot) which resultet in a different annotation: HORVU.MOREX.r3.2HG0178420_1 genome assembly chromosome ii

Hannah-Doerpholz commented 8 months ago

That is strange. I have double checked that I uploaded the correct file here that gave me the issue. To get that annotation I used the plabipd website, not blast: https://www.plabipd.de/mercator_main.html . From that run I get the following output file:

test_for_github.results.txt test_for_github.fa.txt

I'm not sure how mercator and prot-scriber interact with each other exactly, maybe the entry point has to do with this issue. I also tried blast with default settings through UniProt and NCBI. On UniProt I do actually get a hit with a similar annotation, the protein name "DnaJ-like subfamily B member 13" in Glycine soja. Here, the "b" is not missing form "member" though.

Bildschirmfoto von 2024-01-17 13-54-16

I hope this additional information helps. At least, I think the error should be reproducible through plabipd/mercator.