wikipathways / pathway-figure-ocr

Extracting gene sets from published pathway figures
Apache License 2.0
15 stars 2 forks source link

Utilize the hierarchical structure of pathway content #22

Open AlexanderPico opened 3 years ago

AlexanderPico commented 3 years ago

Similar to Gene Ontology, our collection of annotated pathway figures can be hierarchically organized, i.e., the contents of some figures partially overlap or are sometimes subsets of others. We should be able to use this information to define sub-pathway modules that are commonly represented, as well as super-pathway sets that share a common core. Such modules and sets will be analyzed by hierarchical clustering and distance measures such as Jaccard. The gene sets derived from this analysis would be valuable to capture and use as inferred pathway-based sets for query paths in BioThings Explorer. The hierarchical structure itself can be represented as an ontology. Likewise, other ontologies can be used to structure the content, including the Pathway Ontology (for figure annotations), Gene Ontology (for extracted genes) and MeSH (for extracted chemicals).

AlexanderPico commented 2 years ago

During the past year we:

To do:

AlexanderPico commented 2 years ago

Work with Eric (Zhang lab; CPTAC) to try NetSAM on PFOCR with filtered Jaccard distance weights.