Closed cannin closed 1 year ago
Hello! My name is Ian Xul Belaustegui, I'm a Biology and Math major from Mexico, but I have a strong interest in software development and computer science. I think this project sounds really interesting and I would be very excited to participate in GSoC with nrnb. I will start by researching this project further and developing a better idea of the steps that would be necessary to complete it. However I wanted to reach out first and introduce myself, as well as ask if there is anything more that I should consider to apply for this project. Any information would be greatly appreciated :)
@ianxul Feel free to reach out to me by email (see above) early next week with any initial ideas for a proposal. If there is any confusion, I will try to resolve your questions.
A reminder that the application period opens on Monday April 4. Proposals to NRNB must be submitted on the official GSoC Site (https://summerofcode.withgoogle.com/) before April 19, 18:00 UTC to be considered, and contributors are encouraged to submit proposals in draft format early, so that mentors can give feedback directly at the GSoC site.
Closing in preparation for GSoC 2023.
Background
Reactome (https://reactome.org/) is a free, open-source, curated, and peer-reviewed pathway database. The project features a number of curators that continually add new interaction information from the scientific literature. Curators go through many papers and the suitability of the information in one paper to a particular pathway may not be obvious.
Goal
During GSOC 2020 (https://github.com/cannin/enhance_nlp_interaction_network_gsoc2020), work was done to map publications to the Reactome hierarchy of pathways. This was done by building a vector embedding of MeSH terms (https://www.ncbi.nlm.nih.gov/mesh/) for each pathway to which a cosine similarity calculation could identify the most related pathway for a query given vector embedding of MeSH terms (MeSH terms for a text provided by: https://ii.nlm.nih.gov/MTI/). The goal here is to build on this and be able to generate this vector embedding for a publication given by a curator very quickly (either from the full-text or the PubMed abstract) and provide this as a callable service (potentially a Flask-based API).
Getting Started
Eventually, you'll need to write a proposal (see details: https://nrnb.org/gsoc).
Difficulty Level: Medium
Conceptually easy, but depends on building on a previous student's work and producing a robust prototype.
Size and Length of Project
Size: 175 hours Length: 12 weeks
Skills
Public Repository
Potential Mentors
Augustin Luna Guanming Wu