monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

Review StringDB ingest now that they are cc-by #581

Open kshefchek opened 6 years ago

kshefchek commented 6 years ago

Thanks in part to @mellybelly @kltm @lwinfree and all the reusable data team, StringDB is now all cc-by, so we can pull a lot more data. Currently we are only ingesting the protein.links.detailed file.

mbrush commented 6 years ago

As we do this, we should also refactor ingest to do make finer-grained distinctions between subtypes of of genetic interactions they capture. at present we map all genetic interaction relations to RO:0002435 (genetically interacts with), but BioGrid captures more detail here that would be useful (e.g. pos/neg genetic interactions, synthetic lethality interactions)

kshefchek commented 6 years ago

As we do this, we should also refactor ingest to do make finer-grained distinctions between subtypes of of genetic interactions they capture.

See http://version10.string-db.org/help/database/#table-networkactions

field description
item_id_a internal protein identifier.
item_id_b internal protein identifier.
mode type of interaction ("reaction", "expression", "activation", "ptmod"(post-translational modifications), "binding", "catalysis")
action the effect of the action ("inhibition", "activation")
is_directional no documentation
a_is_acting the directionality of the action if applicable (1 gives that item_id_a is acting upon item_id_b)
score the best combined score of all interactions in string.