nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
114 stars 38 forks source link

Develop a Configurable Converter of JSON Data to the BioPAX Format #175

Closed cannin closed 2 years ago

cannin commented 3 years ago

Background

Biological Pathway Exchange (BioPAX, http://www.biopax.org/) is a standard language based on XML that aims to enable integration, exchange, visualization and analysis of biological pathway data. Specifically, BioPAX supports data exchange between pathway data groups and thus reduces the complexity of interchange between data formats by providing an accepted standard format for pathway data. Currently, there is no simple converter from basic edgelists to the BioPAX format. The main existing library for BioPAX is Paxtools (https://github.com/BioPAX/Paxtools).

Goal

The goal would be to produce a converter from a JSON edgelist to the BioPAX format that is configurable and extendable. The converter would need to be configurable to allow users a simple way to map their concepts to those in BioPAX, as well as, to specify database identifiers (e.g., HGNC, https://www.genenames.org/ or UniProt) they are inputting. The output would need to pass the validator (http://biopax.baderlab.org/). Ideally, the project would be in Java, but examples showing calls in Python to Java with Pyjnius (https://pyjnius.readthedocs.io/en/stable/) would be useful.

Extension

We will use an existing converter (https://github.com/PathwayCommons/factoid-converters); this may be sufficient for many users, but the goal is to provide an extension mechanism.

Simplicity

Part of the goal is to make the extension of the converter as simple as possible. Part of this will be to allow the configuration to be done with Groovy (https://groovy-lang.org/); a Java-syntax-compatible scripting language for the Java platform with dynamic typing similar to Python.

Getting Started

Get familiar with the code and come up with a plan for the proposal.

Main Entry Point (jsonToBiopax)

https://github.com/PathwayCommons/factoid-converters/blob/334583395db45f00c3a29dcde890765b9dd78e76/src/main/java/factoid/web/Controller.java#L39

Main Class

https://github.com/PathwayCommons/factoid-converters/blob/master/src/main/java/factoid/converter/FactoidToBiopax.java

add Method: Users need to be able to write custom Groovy-based add methods and have them injected at runtime

https://github.com/PathwayCommons/factoid-converters/blob/334583395db45f00c3a29dcde890765b9dd78e76/src/main/java/factoid/converter/FactoidToBiopax.java#L93

Sample Data

https://github.com/PathwayCommons/factoid-converters/blob/master/src/test/resources/test2.json

Calling Groovy from Java

https://stackoverflow.com/questions/3989592/calling-a-groovy-function-from-java

Difficulty Level 2

Students would need to get familiar with the basics of BioPAX and Paxtools.

Skills

Public Repository

Potential Mentors

Augustin Luna @metincansiper

yash170106065 commented 3 years ago

@cannin I would like to work on this familiar with java and python and work with tabular data and excel files in the spring framework and contributing in NRNB from last 1 year

SGanguly1999 commented 3 years ago

@cannin I would also like to work in this issue too.Is it availaible for me to work on?

shaliniiit commented 3 years ago

@cannin I have gone through the code and the issue and feel like its a good fit for me having worked on similar projects in other universities like Penn State and CMU. I'd be happy to start working. Please look up my resume: https://drive.google.com/file/d/1pcqymg8_DTuCQTTV8cUyEfISpvD63Jlq/view?usp=sharing

cannin commented 3 years ago

@yash170106065 @SGanguly1999 @shaliniiit the project description has been heavily modified, but includes additional details; this may make you more or less interested.

avinal commented 3 years ago

Hello @cannin, I am interested in contributing to this project. I have previously worked on parsers, Spring Boot, and core Java, never used Groovy but I hope it won't take much to understand that. If I understand correctly the project is to create a JSON to BioPAX format converter with validation and configurable output.

cannin commented 3 years ago

@avinal there is a large overlap between Java and Groovy

avinal commented 3 years ago

Yepp I got that after seeing their description and syntax. So is there any programming competency test for the project or we can start discussing it ?

cannin commented 3 years ago

There is no test. The central application component is the proposal (https://nrnb.org/gsoc.html). Demonstration of your understanding of the code in the proposal is valuable. This can happen with small code snippets in your intended plan and/or pointing to previous code you've worked on.

avinal commented 3 years ago

Thanks @cannin , seems like a nice starting point. I will update you about my proposal when it is review ready.

Just to confirm, is this the preferred way of communication for the application period and discussions or should I switch to emails/slack/gitter/IRC ?

cannin commented 3 years ago

Comments on GitHub make it easier for me to stay organized, if I have responded. Draft proposals on Google Docs can be sent via email. Due to time constraints and many projects, I may not be able to give comments more than 1-2 times (choose your timing wisely).

khanspers commented 2 years ago

Cleanup in preparation for GSoC 2022.