sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
http://indra.bio
BSD 2-Clause "Simplified" License
172 stars 65 forks source link

Uniprot entries error with rdflib #159

Open bgyori opened 7 years ago

bgyori commented 7 years ago

Currently when UniProt is queried via the web service, an RDF is parsed using rdflib. Example: http://www.uniprot.org/uniprot/O61608.rdf

It seems like some UniProt entries aren't rdflib compatible because the following error is raised:

O61608.rdf:363:0: rdf:ID value is not a value NCName: _F987AB8B208C4838_rdfs.comment_Binds%201%20FAD.

This probably means that the ID contains some invalid characters. We either need to use an endpoint other than RDF (though I remember at some point being convinced that RDF is the only one that has all the information we need) or we need to get an RDF string and do some replacing before passing it to rdflib.

bgyori commented 7 years ago

Two comments: for most proteins we don't query the web service at all anymore. And also, for many proteins the RDF file is okay and only some error like the example above.