nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
120 stars 40 forks source link

Generate Automated Layouts for Pathways with Large-Language Model (e.g., ChatGPT) Planning #232

Open cannin opened 10 months ago

cannin commented 10 months ago

Background

Pathway diagrams help researchers understand complex biological processes (i.e., pathways). The Systems Biology Graphical Notation (SBGN, https://sbgn.github.io/) is a formalism with a set of interconnected tools and file formats (SBGNML) for generating diagrams of these processes. A lot of pathway content exists in textual databases and automated layout of this pathway content can be challenging. Manually laid out pathways tend to convey a specific narrative that is lost when using automated layout algorithms that lack understanding of biology (https://academic.oup.com/bib/article/22/5/bbab103/6217719).

Large-Language Models (LLMs, e.g., ChatGPT, LLaMA) and Multimodal (GPT-4V, LLaVA) have been used for a variety of tasks: responding to questions, writing content, etc thanks to the huge abundance of text content on which it has been trained. Using text-based formats, LLMs can also generate diagrams (https://www.mermaidchart.com/blog/posts/mermaid-chart-chatgpt-plugin-combines-generative-ai-and-smart-diagramming). Separately, ChatGPT and related models have included in their training data SBGN content thanks to diagrams rendered in the SBGNML format.

Recent research has shown that LLMs can be leveraged to aid in diagram generation and layout (https://github.com/aszala/DiagrammerGPT) through a two-stage process (planning then generation).

Goal

The goal is to utilize LLMs (e.g., ChatGPT) to work on a pipeline to aid in the automatic layout of SBGN diagrams.

Difficulty

Easy-Medium; Easy to start, difficult to produce well

Size and Length of Project

medium: 175 hours 12 weeks preferred

Skills

Python

Public Repository

Potential Mentors

Augustin Luna Adrien Rougny

Raya679 commented 10 months ago

Hello @cannin, I am Raya Chakravarty, currently pursuing my BTech in Computer Science. I am particularly interested in this issue and would like to contribute to this project during the GSOC program.

I have prior experience with Large Language Models (LLMs) and have developed a Healthcare Chatbot by fine-tuning LLMs, specifically Llama.

I am going through the resources and links you have provided above. Currently, I am exploring the SBGN Documentation. Are there any additional tasks you would like me to undertake apart from these?

7070Shreyash commented 10 months ago

Hey @cannin , My name is Shreyash, and I'm a B.Tech CSE student with proficiency in python and Machine Learning/ Deep Learning. I also have experience with Large Language Models (LLMs).

Having reviewed the project goal and provided resources, I'm keenly interested in contributing to this issue through the GSoC program. I'm currently immersed in the documentation and links, and I'm eager to put my skills to use.

Thanks

khanspers commented 9 months ago

NRNB has been accepted as a mentoring organization for GSoC 2024. The contributor application period is March 18 – April 2. Here are some useful links:

GSoC contributor guide NRNB project proposal template Eligibility requirements Full program timeline

sumana-2705 commented 4 hours ago

Hello @cannin @adrienrougny

My name is Sumana Sree, I am currently doning my Masters in Indian Institute of Technology (BHU) in the field of Machine Learning. I would love to contribute to this project and gain a deeper understanding of LLM's. Can you let me know whether this project is open for GSoC-2025?