jotech / gapseq

Informed prediction and analysis of bacterial metabolic pathways and genome-scale networks
GNU General Public License v3.0
161 stars 33 forks source link

HTML entities for special characters in reaction name causes incorrect uniprot queries #196

Closed Waschina closed 1 year ago

Waschina commented 1 year ago

Example: PWY-7981

Minimal example to reproduce issue:

gapseq find -p PWY-7981 genome.faa.gz

The column "reaName" in the file meta_pwy.tbl

alpha-dystroglycan xyloside beta-1,4-glucuronosyltransferase;alpha-dystroglycan beta1,4-xylosyltransferase;Rbo5P-3-betaGalNAc-(1→3)-betaGlucNAc-(1→4)-P-6-O-alphaMan-[protein] ribitol 5-phosphate transferase;betaGalNAc-(1→3)-betaGlucNAc-(1→4)-P-6-O-alphaMan-[protein] ribitol 5-phosphate transferase;alpha-dystroglycan alpha1,3-xylosyltransferase;alpha-dystroglycan beta1,3-glucuronosyltransferase;alpha-dystroglycan alpha1,3-xylosyltransferase/beta1,3-glucuronosyltransferase;D-ribitol-5-phosphate cytidylyltransferase

Here, the HTML entity that breaks the reaction name extraction is →.

Waschina commented 1 year ago

Bug was related to #173

Issue should be now fixed with commit 8536a73