Open cannin opened 10 months ago
Hello @cannin, I am Raya Chakravarty, currently pursuing my BTech in Computer Science. I am particularly interested in this issue and would like to contribute to this project during the GSOC program.
I have prior experience with Large Language Models (LLMs) and have developed a Healthcare Chatbot by fine-tuning LLMs, specifically Llama.
I am going through the resources and links you have provided above. Currently, I am exploring the SBGN Documentation. Are there any additional tasks you would like me to undertake apart from these?
Hey @cannin , My name is Shreyash, and I'm a B.Tech CSE student with proficiency in python and Machine Learning/ Deep Learning. I also have experience with Large Language Models (LLMs).
Having reviewed the project goal and provided resources, I'm keenly interested in contributing to this issue through the GSoC program. I'm currently immersed in the documentation and links, and I'm eager to put my skills to use.
Thanks
NRNB has been accepted as a mentoring organization for GSoC 2024. The contributor application period is March 18 – April 2. Here are some useful links:
GSoC contributor guide NRNB project proposal template Eligibility requirements Full program timeline
Hello @cannin @adrienrougny
My name is Sumana Sree, I am currently doning my Masters in Indian Institute of Technology (BHU) in the field of Machine Learning. I would love to contribute to this project and gain a deeper understanding of LLM's. Can you let me know whether this project is open for GSoC-2025?
Background
Pathway diagrams help researchers understand complex biological processes (i.e., pathways). The Systems Biology Graphical Notation (SBGN, https://sbgn.github.io/) is a formalism with a set of interconnected tools and file formats (SBGNML) for generating diagrams of these processes. A lot of pathway content exists in textual databases and automated layout of this pathway content can be challenging. Manually laid out pathways tend to convey a specific narrative that is lost when using automated layout algorithms that lack understanding of biology (https://academic.oup.com/bib/article/22/5/bbab103/6217719).
Large-Language Models (LLMs, e.g., ChatGPT, LLaMA) and Multimodal (GPT-4V, LLaVA) have been used for a variety of tasks: responding to questions, writing content, etc thanks to the huge abundance of text content on which it has been trained. Using text-based formats, LLMs can also generate diagrams (https://www.mermaidchart.com/blog/posts/mermaid-chart-chatgpt-plugin-combines-generative-ai-and-smart-diagramming). Separately, ChatGPT and related models have included in their training data SBGN content thanks to diagrams rendered in the SBGNML format.
Recent research has shown that LLMs can be leveraged to aid in diagram generation and layout (https://github.com/aszala/DiagrammerGPT) through a two-stage process (planning then generation).
Goal
The goal is to utilize LLMs (e.g., ChatGPT) to work on a pipeline to aid in the automatic layout of SBGN diagrams.
Difficulty
Easy-Medium; Easy to start, difficult to produce well
Size and Length of Project
medium: 175 hours 12 weeks preferred
Skills
Python
Public Repository
Potential Mentors
Augustin Luna Adrien Rougny