nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
114 stars 38 forks source link

Develop a Reactome Curation Support Tool that Maps Text to the Reactome Pathway Hierarchy #194

Closed cannin closed 1 year ago

cannin commented 2 years ago

Background

Reactome (https://reactome.org/) is a free, open-source, curated, and peer-reviewed pathway database. The project features a number of curators that continually add new interaction information from the scientific literature. Curators go through many papers and the suitability of the information in one paper to a particular pathway may not be obvious.

Goal

During GSOC 2020 (https://github.com/cannin/enhance_nlp_interaction_network_gsoc2020), work was done to map publications to the Reactome hierarchy of pathways. This was done by building a vector embedding of MeSH terms (https://www.ncbi.nlm.nih.gov/mesh/) for each pathway to which a cosine similarity calculation could identify the most related pathway for a query given vector embedding of MeSH terms (MeSH terms for a text provided by: https://ii.nlm.nih.gov/MTI/). The goal here is to build on this and be able to generate this vector embedding for a publication given by a curator very quickly (either from the full-text or the PubMed abstract) and provide this as a callable service (potentially a Flask-based API).

Getting Started

Eventually, you'll need to write a proposal (see details: https://nrnb.org/gsoc).

Difficulty Level: Medium

Conceptually easy, but depends on building on a previous student's work and producing a robust prototype.

Size and Length of Project

Size: 175 hours Length: 12 weeks

Skills

Public Repository

Potential Mentors

Augustin Luna Guanming Wu

ianxul commented 2 years ago

Hello! My name is Ian Xul Belaustegui, I'm a Biology and Math major from Mexico, but I have a strong interest in software development and computer science. I think this project sounds really interesting and I would be very excited to participate in GSoC with nrnb. I will start by researching this project further and developing a better idea of the steps that would be necessary to complete it. However I wanted to reach out first and introduce myself, as well as ask if there is anything more that I should consider to apply for this project. Any information would be greatly appreciated :)

cannin commented 2 years ago

@ianxul Feel free to reach out to me by email (see above) early next week with any initial ideas for a proposal. If there is any confusion, I will try to resolve your questions.

khanspers commented 2 years ago

A reminder that the application period opens on Monday April 4. Proposals to NRNB must be submitted on the official GSoC Site (https://summerofcode.withgoogle.com/) before April 19, 18:00 UTC to be considered, and contributors are encouraged to submit proposals in draft format early, so that mentors can give feedback directly at the GSoC site.

khanspers commented 1 year ago

Closing in preparation for GSoC 2023.