sorgerlab / famplex

Resources for grounding protein families and complexes from text and describing their hierarchical relationships.
https://sorgerlab.github.io/famplex
Creative Commons Zero v1.0 Universal
18 stars 27 forks source link

Complexes vs. Families #15

Closed hickst closed 7 years ago

hickst commented 7 years ago

Hi guys,

We have the following additional entity name strings that we need to ground:

Activin A
Activin AB
Activin B
Inhibin A
Inhibin B
AMPK alpha1beta1gamma1
AMPK alpha1beta1gamma2
AMPK alpha1beta1gamma3
AMPK alpha1beta2gamma1
AMPK alpha1beta2gamma2
AMPK alpha2beta1gamma1
AMPK alpha2beta2gamma1
AMPK alpha2beta2gamma2
AMPK alpha2beta2gamma3
AMPK a1b1g1
AMPK a1b1g2
AMPK a1b1g3
AMPK a1b2g1
AMPK a1b2g2
AMPK a2b1g1
AMPK a2b2g1
AMPK a2b2g2
AMPK a2b2g3
alpha1beta1gamma1
alpha1beta1gamma2
alpha1beta1gamma3
alpha1beta2gamma1
alpha1beta2gamma2
alpha2beta1gamma1
alpha2beta2gamma1
alpha2beta2gamma2
alpha2beta2gamma3

I have the some questions about this set and the use of the Relations table: 1) It is my belief (however misinformed) that these are all protein complexes, 1a) yet the Relations file lists some of them as families. What is the correct mapping for each of these strings? 1b) Aren't the Inhibin dimers also part of the Activin family? (I looked for entries like: BE,Inhibin_B,isa,BE,Activin but didn't find any).

2) Some (all?) of these are synonym strings for existing entries in the Relations file so... 2a) Can your Relations file handle the synonyms that entities often have? How? 2b) Should we map the synonyms ourselves (i.e., in a separate table we ground these strings to Bioentities entities) or do you have a way to add these strings directly to the BE Relations table?

Thanks for your help. -t

bgyori commented 7 years ago

1) I think the confusion comes from the fact that Activin is a family of complexes. Here is the relevant portion of relations:

HGNC    INHBA   partof  BE  Activin_A
HGNC    INHBA   partof  BE  Activin_AB
HGNC    INHBB   partof  BE  Activin_AB
HGNC    INHBB   partof  BE  Activin_B
BE   Activin_A  isa     BE  Activin
BE   Activin_AB isa     BE  Activin
BE   Activin_B  isa     BE  Activin

So INHBA and INHBB are parts of specific complexes (Activin_A, Activin_B and Activin_AB), which are then referred to together as the Activin family.

Inhibin is also a family of complexes but it is currently not defined in the relations file, so it needs to be added - I will do this today.

2) The relations file contains symbols in a name space. The entries have no text names or synonyms associated with them. Synonyms need to be handled when mapping to entries from text to the BE name space. That is, for instance, "Erk", "ERK" and "MAPK1/3" would all be mapped to the BE:ERK entry. Our own mapping table is in grounding_map.csv which does this for ungrounded/misgrounded strings appearing most often in our use cases.

bgyori commented 7 years ago

To be able to map

AMPK alpha1beta1gamma1
AMPK alpha1beta1gamma2
AMPK alpha1beta1gamma3
AMPK alpha1beta2gamma1
AMPK alpha1beta2gamma2
AMPK alpha2beta1gamma1
AMPK alpha2beta2gamma1
AMPK alpha2beta2gamma2
AMPK alpha2beta2gamma3

all we need to do is define these specific complexes in relations.csv. I will do this later today.

hickst commented 7 years ago

Thanks Ben, I think I understand now.

As for the a[12]b[12]g[123] complexes: we appreciate their addition. From answer (2) above, I infer that we will then handle the mapping of all these synonym strings to whatever symbolic family name that you will create (or maybe you will map these to an existing AMPK complex or family symbol?)

bgyori commented 7 years ago

Great! My plan is to create a BE entry that the string "AMPK alpha1beta1gamma1" can be mapped to. The name would be, for instance, AMPK_A1B1G1. That complex will then be defined in the relations as:

BE AMPK_A1B1G1 isa BE AMPK
HGNC PRKAA1 partof BE AMPK_A1B1G1
HGNC PRKAB1 partof BE AMPK_A1B1G1
HGNC PRKAG1 partof BE AMPK_A1B1G1

I would then do the same for all the other specific complexes.

bgyori commented 7 years ago

@hickst take a look at #16, it adds all the entities and relations that you need to map your inhibin and AMPK strings. If this looks good, I'll merge it.

hickst commented 7 years ago

It looks good to me....thanks.