nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
114 stars 38 forks source link

Making biological network knowledge discoverable and accessible on search engines #218

Open cannin opened 1 year ago

cannin commented 1 year ago

Background

A biological pathway is a network consisting of interactions between biological molecules (e.g. proteins and chemicals) in a cell that lead to a certain product or a change. Researchers commonly use such networks to summarize research results about how biological molecules interact in healthy individuals and how changes in these relationships can lead to diseases, such as arthritis and cancer.

Pathway Commons (PC) is a popular web resource that aggregates machine-readable data about biological pathways (>5 700) and interactions (>2.4 million) from 22 popular curated public databases. Users can interactively explore this resource through the PC Search page to find out how a query (e.g. gene or disease) is connected to millions of pathways and interactions in the collection.

Goal

Create a sitemap for PC Search pathway and interaction network pages.

Sub-Tasks

This work will involve modifications to the PC Search source code:

Stretch goals

Significance

This will immediately improve the indexing of PC Search pages by Google. More broadly, this project will expand the audience of researchers able to access and reuse biological research knowledge curated from publications. This will accelerate research discovery and increase the value of each knowledge item in PC Search.

How to start

Difficulty Level: Easy

Size and Length of Project

Skills

Public Repository

Potential Mentors


Previous post

Background

Pathway Commons (http://pathwaycommons.org/) is an aggregated database of molecular interactions of millions of interactions. Data stored in the Pathway Commons is in the BioPAX (http://biopax.org/) XML-based format. The data is aggregated from a collection of approximately 20 databases. Data from Pathway Commons is accessible here.

A previous version of Pathway Commons site included pages for each pathway providing details, but this is missing from the current site. Previous examples:

Goal

The goal is to generate a new static site from Pathway Commons content especially the visualization of pathways using the Systems Biology Graphical Notation.

Sub-Tasks

  1. Generate static images for each pathway using SyBLaRS webservice: https://gist.github.com/cannin/7e35f3fae274370bd0a70c7b1840c743
  2. Support the ease of maintenance and modification.
    • Use a classless CSS design: https://classless.de/
    • Explore client-side search: lunr.js
    • Use a well-supported static site generators (pick from one of these: Jekyll, Hugo, Gatsby, Pelican)
    • Minimize the use of Javascript
    • Build a responsive mobile-first site that is functional an larger monitors
  3. Dependent on time; Generate pathway images for other large collections of SBGN content, including BioModels.
  4. Dependent on time; Generate text description for pathways lacking a description

How to Start

Interested applicants should:

  1. Run example code for static image generation: https://gist.github.com/cannin/7e35f3fae274370bd0a70c7b1840c743
  2. Explore libraries/sites mentioned in Goal section
  3. Explore Pathway Commons API: https://www.pathwaycommons.org/pc2/home

Difficulty Level: Easy

Size and Length of Project

175 hours 12 weeks

Skills

HTML (essential), CSS, Python, Jekyll, Javascript

Public Repository

Potential Mentors

Augustin Luna ({firstname}{last_name} AT hms.harvard.edu)

Simer13 commented 1 year ago

hello. I am a first year BTECH student and I have been trying to contribute to the open source projects and am interested in helping with this project for GSoC 2023. I am well profound in the skills you have mentioned earlier and have made some projects as well. I really hope that you assign me this project. I will do my best and fix the issue.

deep-poharkar commented 1 year ago

Hey @cannin, I believe I fulfil the skills with a significant experience and I would love to contribute to this if you allow me to.

cannin commented 1 year ago

@Simer13 @deep-poharkar Thanks for your interest. We are still in the process of applying as a mentoring organization; we should have an answer by Feb 23. Check for an update on that date here.

yagyesh-bobde commented 1 year ago

Hello, I am Yagyesh Bobde. I have experience in web dev. I am also interested in working on this project.

khanspers commented 1 year ago

NRNB has been accepted as a mentoring organization for GSoC 2023! Contributor applications open on March 20. Here are some useful links:

GSoC contributor guide NRNB project proposal template Eligibility requirements Full program timeline

cd-vishal commented 1 year ago

@khanspers is there any slack channel or community forum where I can ask some questions related to this project?

khanspers commented 1 year ago

Maybe try pathway-commons-help@googlegroups.com or email the potential mentor @cannin directly. See email in project description.

drstone-genius04 commented 1 year ago

hi allan here am a third year IT eng student I work in web dev and ml domain hoping to contribute to this project wanted to ask about the required documentation for this project

drstone-genius04 commented 1 year ago

HEY @cannin @khanspers I am getting the following error when I am trying to setup the project in my pc do let me know if I have missed out on any step

drstone-genius04 commented 1 year ago

Screenshot (308)

drstone-genius04 commented 1 year ago

hi @cannin @khanspers actually wanted to inform that was able to solve the error and get the required data

drstone-genius04 commented 1 year ago

Screenshot (311)

drstone-genius04 commented 1 year ago

had a small doubt instead of Jekyll is it ok if I use pelican

cannin commented 1 year ago

@drstone-genius04 yes okay to use pelican.

drstone-genius04 commented 1 year ago

ok thanks for your reply

On Wed, Mar 15, 2023 at 12:58 AM Augustin Luna @.***> wrote:

@drstone-genius04 https://github.com/drstone-genius04 yes okay to use pelican.

— Reply to this email directly, view it on GitHub https://github.com/nrnb/GoogleSummerOfCode/issues/218#issuecomment-1468702908, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATT565DMKYEFKRAJNCKLI7LW4DBEFANCNFSM6AAAAAAUB5J6VU . You are receiving this because you were mentioned.Message ID: @.***>

drstone-genius04 commented 1 year ago

@cannin I am getting the following error while installing pelican plugins should I update my python version and try while note my python version is 3.10.10

awantikamallick commented 1 year ago

Hello, Awantika this side. I went through the project and would like to work on it, can you please guide where I can connect with the organization as in any slack channel or anything, and how where should I submit my idea?

cannin commented 1 year ago

@awantikamallick you can post questions here or email me. if you have a good draft proposal you can make a google doc; i will try to comment on it. final proposals need to be submitted to Google.

GSoC contributor guide NRNB project proposal template Eligibility requirements Full program timeline

duckcommit commented 1 year ago

Good Day to you @cannin , myself Vyshnav Ajith. The project looks interesting to me and I have few doubts regarding Pathway Commons. It would be kind enough if you could share your email address so that I could get connected to you. Thank you

awantikamallick commented 1 year ago

@cannin I think there is some error in this pathway: http://identifiers.org/kegg.pathway/hsa01100, rest others worked well!

awantikamallick commented 1 year ago

error

cannin commented 1 year ago

@vyshaj See the project description.

cannin commented 1 year ago

@awantikamallick make a note of this in your proposal. ignore it for now as you work on the rest of your proposal.

duckcommit commented 1 year ago

Good Day @cannin . I had tried to contact you via email. I would like to be part of this project and learn under you. I want to be part of this project not because I am well-versed in the necessary technologies, but because of the aim to learn more in depth. I have referred the libraries and I am ready for this. How can I be reaching out with the proposal so that you can make necessary changes to it?

Thank You.

Murdock9803 commented 6 months ago

Hello @cannin @khanspers, I wish you a very prosperous new year ahead. Myself Ayush Sahu, an undergraduate developer from India. I was going through various projects under the NRNB organisation, and Mr. Alexander Pico suggested me this repository. I believe my technical skills and Stack make me a good fit for this project. I searched (and presently studying) about Pathway Commons and found this project interesting. Is this project still open to work upon ?

maxkfranz commented 6 months ago

@Murdock9803, yes. The first steps would be to do background research and plan how to carry out the project. For instance, you could start by researching tools like SyBLaRS, Puppeteer, and Playwright.

FYI -- @jvwong

Murdock9803 commented 5 months ago

@maxkfranz As you said to learn about the above technologies, I'm currently doing that. Also planning for LUNR.js and classless css. I will update here once I am done reading about these or get stuck somewhere. Although I think I can find resources or documentations by my own, Are there any specific resources you would suggest to learn these ?

maxkfranz commented 5 months ago

@maxkfranz As you said to learn about the above technologies, I'm currently doing that. Also planning for LUNR.js and classless css. I will update here once I am done reading about these or get stuck somewhere. Although I think I can find resources or documentations by my own, Are there any specific resources you would suggest to learn these ?

Those are good starting points.

Murdock9803 commented 5 months ago

Hello @maxkfranz and @cannin, I hope you are doing fine. Sorry for updating late, as I had my college examination coming. I will inform you about these beforehand now onwards.

Firstly, here is the update regarding the tools you specified to know about :-

Next plans

Some questions I have

# Thank you very much for your time and attention, looking forward to hearing from you soon and very excited to work on this project and also further projects. I am not targeting only GSoC, but I also want to work further on projects like this as I never thought we could merge biology and programming. This looks very exciting. Thank you

AswalGaurang commented 5 months ago

Hi @cannin @khanspers, I'm a dual degree BE Computer Science and MSc Biological Sciences student from BITS Pilani, India. I'm excited about the projects mentioned above and have started learning the tools recommended in the comments. Looking forward to staying connected and actively participating in GSoC.

Murdock9803 commented 5 months ago

@maxkfranz @cannin @khanspers This is just a follow-up to my previous message. Please share your valuable insight. I'm really looking forward to complete this project in GSoC this year.

Hello @maxkfranz and @cannin, I hope you are doing fine. Sorry for updating late, as I had my college examination coming. I will inform you about these beforehand now onwards.

Firstly, here is the update regarding the tools you specified to know about :-

  • Puppeteer - I thoroughly researched about this and tried to perform some simple tasks too, like taking screenshots or making pdf, page dimensions, etc.. Web automation and web scraping was new to me, But the documentation was really helpful on Puppeteer website. My knowledge of async javascript also helped me.
  • Playwright - Supporting more browsers than Puppeteer (which was mainly developed for chromium), this was a bigger task and I learned about writing tests, testing a single file as well as whole application, debugging, etc..
  • Lunr.js - researched about this and found good amount of documentation, I am looking forward to apply this in a project of mine asap, So I can get experience working with this.

    • solr (written in java) is also good but it targets larger datasets, also ElasticsSearch. I just explored these too, but did not give much time in these.
  • classless css (classless.de) - Learning this was an easy task as it is a CSS framework just like others, but it offers a really minimalistic and simple design for webpage.
  • SyBLaRS and SBGN - I am from engineering background, So found it difficult (or overwhelming) to learn about systems biology at first. I am also trying to connect with some professor from my college as they can also help me in getting acquainted with this.

Next plans

  • To learn more about the biology part that is going to be used in the project.
  • To work more on my knowledge of backend development as I will have to handle databases from pathway commons.
  • As google also released the GSoC timeline, I am thinking to prepare a timeline-based goal list So that everything goes smooth.
  • To give you detailed weekly reports, apart from small updates.

Some questions I have

  • As this project was opened last year but could not make it to GSoC projects list, I would like to know that what is the present condition ? like we have to work on it from the start or some work is already done ?
  • Please tell me, to what extent should I learn about the systems biology part, and any areas where I can easily get started with this.
  • Should I start working on the project before the official GSoC coding period begins, or I should work on some other issue till then ?

Thank you very much for your time and attention, looking forward to hearing from you soon and very excited to work on this project and also further projects. I am not targeting only GSoC, but I also want to work further on projects like this as I never thought we could merge biology and programming. This looks very exciting. Thank you

jvwong commented 5 months ago

I (@jvwong) have updated this project description to:

Murdock9803 commented 5 months ago

Thanks @jvwong , I was confused regarding the progress made, This surely helps clearing some doubts. I'll read the project thoroughly again and will learn the required technologies.

Murdock9803 commented 4 months ago

@jvwong @maxkfranz @cannin Update regarding the project :

khanspers commented 4 months ago

NRNB has been accepted as a mentoring organization for GSoC 2024. The contributor application period is March 18 – April 2. Here are some useful links:

GSoC contributor guide NRNB project proposal template Eligibility requirements Full program timeline

Murdock9803 commented 4 months ago

@cannin I had a small doubt regarding the planning of the project. Apart from the main goal, are the stretch goals supposed to be completed in 12 weeks time or it will be extended ?

Murdock9803 commented 4 months ago

@cannin I had a small doubt regarding the planning of the project. Apart from the main goal, are the stretch goals supposed to be completed in 12 weeks time or it will be extended ?

@jvwong @cannin please have a look. Also, we have to make both HTML and XML sitemaps ?

jvwong commented 4 months ago

@cannin I had a small doubt regarding the planning of the project. Apart from the main goal, are the stretch goals supposed to be completed in 12 weeks time or it will be extended ?

There are prototypes for this already, so should be accomplished in period under GSOC.

@jvwong @cannin please have a look. Also, we have to make both HTML and XML sitemaps ?

XML sitemap is a requirement.

semsoum-712 commented 3 months ago

@cannin Given that we've got the pathway IDs all set, leveraging this file PathwayCommons12.All.hgnc.gmt.gz to generate an XML sitemap using a widely-used and user-friendly language like Python sounds like a solid plan. It will streamline our project development process and save us valuable time.

cannin commented 3 months ago

@semsoum-712 yes; application deadline is April 2 (https://developers.google.com/open-source/gsoc/timeline)

semsoum-712 commented 3 months ago

@cannin Hi Luna,

I hope this comment finds you well. I've just completed a draft of my proposal and would greatly appreciate your expertise in reviewing it. Specifically, I'm seeking your insights to identify any potential mistakes and gather informative suggestions for improvement.

I'm eager to incorporate your feedback to refine my proposal further.

Thank you in advance for your time and support.

Best regards, Asma

GSOC proposal 2024.pdf