jenwilson521 / PathFX

Most up to date PathFX code
7 stars 1 forks source link

Ontology? #3

Open kyleweise-pi opened 4 years ago

kyleweise-pi commented 4 years ago

Tried contacting the support email but it bounced back...

Maybe I missed something in the documentation of the tool, but I was curious as to what ontology PathFX uses for the disease names? Specifically, the "Phenotype" column of the results file that ends with "merged_neighborhood__assoc_table.txt".

Thanks!

-Kyle

jenwilson521 commented 4 years ago

Hi Kyle,

The disease names come from the original input data sources - OMIM, ClinVar, PheGeni, and DisGeNet. When I collect all of the disease-gene associations, I map terms without a CUI identifier to UMLS, but store a dictionary of all original phenotype descriptions. Does that make sense?

I'm not sure why the google group isn't working, sorry about that!

On Fri, Jan 24, 2020 at 1:26 PM kyleweise-pi notifications@github.com wrote:

Tried contacting the support email but it bounced back...

Maybe I missed something in the documentation of the tool, but I was curious as to what ontology PathFX uses for the disease names? Specifically, the "Phenotype" column of the results file that ends with "merged_neighborhood__assoc_table.txt".

Thanks!

-Kyle

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jenwilson521/PathFX/issues/3?email_source=notifications&email_token=ABMON5ZWYVO6ISRYSPYBPOTQ7NMHDA5CNFSM4KLMJ6YKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IIUBMFQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMON574KWBYJD2I3ZO655TQ7NMHDANCNFSM4KLMJ6YA .

-- Jennifer L. Wilson Department of Chemical & Systems Biology Stanford University, Stanford, CA jen.wilson521@gmail.com / 703.969.3318 www.ascientistengineer.me/

kyleweise-pi commented 4 years ago

@jenwilson521 Yes, that makes sense! Thank you. I was curious if you have any recommendation for a way to go about mapping CUI/UMLS to EFO?

jenwilson521 commented 4 years ago

Hi Kyle,

Sorry, what is EFO? When mapping to UMLS, I used the MetaMap software (specifically their batch submission), and then I was able to consolidate multiple databases into UMLS terminology. https://ii.nlm.nih.gov/Batch /index.shtml

Let me know how else I can be helpful,

On Tue, Jan 28, 2020 at 8:51 AM kyleweise-pi notifications@github.com wrote:

@jenwilson521 https://github.com/jenwilson521 Yes, that makes sense! Thank you. I was curious if you have any recommendation for a way to go about mapping CUI/UMLS to EFO?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jenwilson521/PathFX/issues/3?email_source=notifications&email_token=ABMON563P73VLJ7U7PAY5M3RABPCRA5CNFSM4KLMJ6YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKECFEA#issuecomment-579347088, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMON57F7WVH4QXMHKPH7ITRABPCRANCNFSM4KLMJ6YA .

-- Jennifer L. Wilson Department of Chemical & Systems Biology Stanford University, Stanford, CA jen.wilson521@gmail.com / 703.969.3318 www.ascientistengineer.me/

kyleweise-pi commented 4 years ago

@jenwilson521 , EFO is the Experimental Factor Ontology from the EBI. Basically, I am pulling data from different sources (one of them being PathFX) and need to join by an ID column. 2 of the 4 sources use EFO, so I'm thinking about ways to map CUI to EFO so that the Phenotype description columns match. Hope this makes sense. I'll look into the MetaMap tool; only heard of it, never used.

jenwilson521 commented 4 years ago

Hi Kyle,

I would start from the original data sources vs. pulling from PathFX. For one reason, this version of PathFX uses data from 2017 and hasn't yet been updated. Also, it's always good to have the original data in hand for whatever analysis you're doing. This PathFX implementation uses data from ClinVar, Omim, Phenotype-genotype-integrator, and DisGeNet.

Also, mapping terms can be a tedious task. I've found that the UMLS resources are pretty good at mapping to CUI identifier so it seems that it might be easier to go from all sources -> EFO, and EFO -> UMLS and then look at the intersection. I haven't used EFO, but my experience is that UMLS is generally bigger than any of these area-specific ontologies.

On Mon, Feb 3, 2020 at 6:28 AM kyleweise-pi notifications@github.com wrote:

@jenwilson521 https://github.com/jenwilson521 , EFO is the Experimental Factor Ontology https://www.ebi.ac.uk/efo/ from the EBI. Basically, I am pulling data from different sources (one of them being PathFX) and need to join by an ID column. 2 of the 4 sources use EFO, so I'm thinking about ways to map CUI to EFO so that the Phenotype description columns match. Hope this makes sense. I'll look into the MetaMap tool; only heard of it, never used.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jenwilson521/PathFX/issues/3?email_source=notifications&email_token=ABMON55Q6WNO6W34DG7IR2TRBASZPA5CNFSM4KLMJ6YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKUBH6I#issuecomment-581440505, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMON52MOQSSKYPQBGK2FMLRBASZPANCNFSM4KLMJ6YA .

-- Jennifer L. Wilson Department of Chemical & Systems Biology Stanford University, Stanford, CA jen.wilson521@gmail.com / 703.969.3318 www.ascientistengineer.me/

kyleweise-pi commented 4 years ago

Is there a way for me to run PathFX with updated data? Because the PathFX output is (in part) what I want as an end result. If I'm just pulling from the original data sources, I'm losing the PathFX output, no?

jenwilson521 commented 4 years ago

Hi Kyle,

The sad answer is not yet! We're working on pathfx-data-update and so I'm trying to have that released soon. The plan is to release updated versions of PathFX annually.

I think I originally misunderstood how you were using the algorithm and the phenotypes. PathFX at least outputs UMLS CUI terms, so if you could map EFO terms to CUI, you could still use the intersection of those terms as a means for mapping.

Good luck and let me know how else I can be helpful,

On Mon, Feb 3, 2020 at 10:14 AM kyleweise-pi notifications@github.com wrote:

Is there a way for me to run PathFX with updated data? Because the PathFX output is (in part) what I want as an end result. If I'm just pulling from the original data sources, I'm losing the PathFX output, no?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jenwilson521/PathFX/issues/3?email_source=notifications&email_token=ABMON55IACUE5R7IARBL2P3RBBNGXA5CNFSM4KLMJ6YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKU2YOA#issuecomment-581545016, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMON5ZVCYRTHL6ZTYADDITRBBNGXANCNFSM4KLMJ6YA .

-- Jennifer L. Wilson Department of Chemical & Systems Biology Stanford University, Stanford, CA jen.wilson521@gmail.com / 703.969.3318 www.ascientistengineer.me/