monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
14 stars 1 forks source link

Plater startup is failing on bad predicates #258

Closed kevinschaper closed 2 years ago

kevinschaper commented 2 years ago

This is coming up downstream when running Plater against Neo4j loaded from monarch-kg.tar.gz, and the problem might actually be with the merge, but it looks like we have a bad predicate value showing up


INFO:     35.191.3.147:51414 - "GET /1.2/meta_knowledge_graph HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 369, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 59, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 199, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/cors.py", line 78, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 580, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 390, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/fastapi/applications.py", line 199, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/usr/local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/usr/local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 580, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 241, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.8/site-packages/starlette/routing.py", line 52, in app
    response = await func(request)
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 224, in app
    response_data = await serialize_response(
  File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 127, in serialize_response
    raise ValidationError(errors, field.type_)
pydantic.error_wrappers.ValidationError: 7 validation errors for MetaKnowledgeGraph
response -> edges -> 3 -> predicate -> __root__
  string does not match regex "^biolink:[a-z][a-z_]*$" (type=value_error.str.regex; pattern=^biolink:[a-z][a-z_]*$)
response -> edges -> 4 -> predicate -> __root__
  string does not match regex "^biolink:[a-z][a-z_]*$" (type=value_error.str.regex; pattern=^biolink:[a-z][a-z_]*$)
response -> edges -> 73 -> predicate -> __root__
  string does not match regex "^biolink:[a-z][a-z_]*$" (type=value_error.str.regex; pattern=^biolink:[a-z][a-z_]*$)
response -> edges -> 77 -> predicate -> __root__
  string does not match regex "^biolink:[a-z][a-z_]*$" (type=value_error.str.regex; pattern=^biolink:[a-z][a-z_]*$)
response -> edges -> 269 -> predicate -> __root__
  string does not match regex "^biolink:[a-z][a-z_]*$" (type=value_error.str.regex; pattern=^biolink:[a-z][a-z_]*$)
response -> edges -> 607 -> predicate -> __root__
  string does not match regex "^biolink:[a-z][a-z_]*$" (type=value_error.str.regex; pattern=^biolink:[a-z][a-z_]*$)
response -> edges -> 608 -> predicate -> __root__
  string does not match regex "^biolink:[a-z][a-z_]*$" (type=value_error.str.regex; pattern=^biolink:[a-z][a-z_]*$)```
kevinschaper commented 2 years ago

I don't see anything problematic in the predicate column of the tsv

$ cut -f 3 monarch-kg_edges.tsv | sort | uniq -c | sort -rn
1376757 biolink:category
 994966 biolink:expressed_in
 933102 biolink:subclass_of
 484133 biolink:actively_involved_in
 428297 biolink:has_phenotype
 420569 biolink:enables
 365084 biolink:orthologous_to
 321122 biolink:located_in
 168019 biolink:acts_upstream_of_or_within
  97949 biolink:active_in
  85398 biolink:participates_in
  80481 biolink:part_of
  18696 biolink:related_to
   9902 biolink:has_attribute
   4591 biolink:contributes_to
   4221 biolink:develops_from
   3652 biolink:regulates
   3196 biolink:has_output
   3089 biolink:in_taxon
   3088 biolink:negatively_regulates
   3073 biolink:positively_regulates
   2958 biolink:colocalizes_with
   2645 biolink:has_input
   2541 biolink:has_part
   2534 biolink:overlaps
   2160 biolink:temporally_related_to
   1783 biolink:has_participant
   1342 biolink:affects_transport_of
   1178 biolink:acts_upstream_of
   1125 biolink:associated_with
    998 biolink:coexists_with
    867 biolink:subPropertyOf
    812 biolink:capable_of
    701 biolink:interacts_with
    469 biolink:occurs_in
    406 biolink:causes
    391 biolink:acts_upstream_of_or_within_positive_effect
    349 biolink:affects
    305 biolink:preceded_by
    174 biolink:caused_by
    169 biolink:acts_upstream_of_or_within_negative_effect
    155 biolink:inverseOf
    152 biolink:acts_upstream_of_positive_effect
    121 biolink:acts_upstream_of_negative_effect
    104 biolink:disrupts
    100 biolink:expresses
     52 biolink:produced_by
     35 biolink:type
     33 biolink:produces
     31 biolink:derives_from
     30 biolink:precedes
     12 biolink:homologous_to
     10 biolink:location_of
      9 biolink:increases_degradation_of
      6 biolink:has_variant_part
      6 biolink:affects_localization_of
      5 biolink:has_unit
      1 predicate
      1 biolink:object
      1 biolink:model_of
      1 biolink:correlated_with
kevinschaper commented 2 years ago

oh! I see it!

biolink:subPropertyOf

kevinschaper commented 2 years ago

In the super short term, I think I might try to just patch this with sed to biolink:sub_property_of - which isn't a real biolink predicate, but would pass the validation.

@sierra-moxon it looks like this is coming from the kgx conversion of monarch ontology, could it just be that we're not up to date with kgx, or do you think it's something else to dig into on the ontology->kgx conversion?

kevinschaper commented 2 years ago

doh, also biolink:inverseOf