jehoons / sbie_aging

Can we be immortal?
3 stars 0 forks source link

pertbio에서 사용하는 pera에 대한 설명 #59

Open sugyun opened 7 years ago

sugyun commented 7 years ago

pera

Automated extraction of prior information from signaling databases. Pathway Extraction and Reduction Algorithm (PERA) was developed to automatically extract prior information from multiple signaling databases and generate a prior information network.

The input to PERA is a list of (phospho) proteins identified by their HGNC symbols (e.g. AKT1), phosphorylation sites (e.g. pS473) and their molecular status (i.e., activating or inhibitory phosphorylation, total concentration).

The output of PERA is a set of directed interactions between signaling molecules represented in a Simple Interaction Format (SIF).

The PERA software is available at http://bit.ly/bp_prior as a free software under LGPL 3.0 (See supplementary methods for the details of the PERA).

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4539601/pdf/elife04640.pdf

sugyun commented 7 years ago

알고리즘

  1. using the paths-between graph query algorithm (Dogrusoz et al., 2009), PERA generates a sub-graph (i.e., graph-of-interest) of Pathway Commons, which contains all the input proteins and all known connections within their first neighborhoods. => 모든 rppa input 단백질과 first neighborhoods를 노드로 포함하는 sub graph를 만듬

  2. using the phosphorylation and activity state information, input entities are mapped to the corresponding protein states in the graph-of-interest. During this mapping step, protein states that do not match with either the corresponding annotation for phosphorylation or activity state are filtered out.(둘 중 하나라도 안 맞으면 노드를 지움. 데이터베이스에 phosphorylation 위치와 그에 따른 활성이 나와있다는 이야기??) Phosphorylation site mismatches up to 6 residues are tolerated during the filtering step to account for phosphorylation site ambiguities due to either database curation errors or cross-organism annotations. => sub graph에 단백질 상태를 mapping하여 status와 phosphorylation site에 따라 phosphoprotein 노드를 추가하거나 제거함

  3. paths that result in the addition or removal of a phospho-group at a phosphorylation site are extracted and mapped to phosphoprotein nodes. For total protein nodes, all non-phosphoprotein specific, directed signaling paths are included. For this application, the maximum allowable graph-query distance limit was set to 1 and only the Reactome (Matthews et al., 2009) and NCI-Nature PID pathway (Schaefer et al., 2009) data resources were used. Although we limited ourselves to short path distances and two pathway databases that we were most familiar with, PERA can be applied to extract information from any pathway database that exports to BioPAX and can be configured for searching paths of arbitrary length. =>phosphoprotein 노드와 non-phosphoprotein 노드를 합친다. 그중에서 거리가 1인 녀석들만 남김

결론

sugyun commented 7 years ago

주의점 2가지

First semantic issue stems from the ambiguities in mapping proteins with multiple phosphorylation sites. When a protein with multiple phosphorylation sites is experimentally profiled with an antibody, which recognizes only a single phosphorylation site, the antibody will actually bind to a heterogeneous mixture of phospho-states provided that the epitope is phosphorylated (e.g., anti-AKTpS473 Ab may bind to both AKTpS473 and AKTpS473_pT308 but not to AKTpT308). For proteins with multiple observed phosphorylation sites, this might lead to semantic conflicts since a double phosphorylated node should be mapped to both observations (i.e., single and double-phosphorylated states). We included an optional ‘strict’ mapping scheme to map only the phosphoproteins that exactly match the observed node—in our case always single phosphoproteins (e.g., the epitope of anti-AKTpS473 Ab is mapped to AKTpS473 but not to AKTpS473_pT308). Since our extraction algorithm is much more tolerant of missing interactions compared to false interactions, we opted to use this flag for this application. => 하나의 phospho만을 고려함

The second semantic issue stems from the fact that pathway databases are often curated from multiple independent studies, spanning multiple cellular states, cell and tissue types, and even multiple model organisms. As a result, they are a superimposition of possible interactions over a wide range of spatiotemporal and genetic contexts. On the other hand, databases cover only a subset of all possible contexts. In our case, we expect only a subset of the interactions in the pathway databases to be active in our cell lines and cover only a subset of the interactions that we observe. This observation necessitates incorporating prior information as ‘soft’ restraints for network inference. For this purpose, we devised a modified cost function, which includes a term for prior information (See below. Also see [Molinelli, Korkut, Wang et al., 2013] and [Miller et al., 2013]). => 활성화된 링크들만 추리는 작업을 cost function을 통하여 함

sugyun commented 7 years ago

image

sugyun commented 7 years ago

실행 결과

소프트웨어 파일: bp_prior (2).zip 명령 코드: C:\Users\User\Desktop\bp_prior>java -jar bp_prior-2.13.1-single.jar -o prior_network.tsv src/test/resources/example_gene_list.tsv src/test/resources/reactome_mtor_pathway.owl

파일의 압축을 풀고 bp_prior파일 위치에서 cmd에 위의 명령 코드를 실행하면 src/test/resources에 있는 example_gene_list(아래 그림 참조)와 reactome_mtor_pathway.owl가 소프트웨어에서 돌아가게 된다. image

아래와 같은 결과가 나오게 된다. image

(+) status를 모두 c로 바꿔도 같은 결과가 나온다. 단, status를 a나 i로 바꾸면 다른 결과가 나온다.

sugyun commented 7 years ago

status를 모두 a나 i로 바꿧을 경우 아래의 결과가 나오고 image status를 모두 c로 바꿨을 경우 원래와 같은 결과가 나온다.

jehoons commented 7 years ago

중요 노드 셋을 선택하는 방법?

  1. 중요노드를 주관적으로 몇개? 선택
  2. 중요노드와 거리 N개 만큼 떨어진 노드를 검색