nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
114 stars 38 forks source link

Improve the TumorComparer R Package (Add Network Visualization and Available Data Types) #171

Closed cannin closed 3 years ago

cannin commented 3 years ago

Background

TumorComparer (https://www.biorxiv.org/content/10.1101/028159v1) is an algorithm for mapping experimental model systems (i.e., cell lines) to patient samples by examining various -omic profiles.

Goal

The goal is to 1) expand the utility of the algorithm to additional data types (currently hard-coded to 3 data types), 2) provide examples that are patient to patient similarity, and 3) provide a network visualization (using igraph: https://igraph.org/) of the resulting mapping.

Difficulty Level 1

Much of the work will be refactoring existing code that would serve as a guide for the student.

Skills

Public Repository

Potential Mentors

Augustin Luna

chilampoon commented 3 years ago

Interesting. Seems like the preprint hasn't published yet? Would it be possible for a student to update the preprint as a co-author after finishing the above goals and also doing some additional analyses? @cannin

cannin commented 3 years ago

@chilampoon you are correct the pre-print is not published. If the summer code work is included as part of the final publication, it would be natural for the student to be included. I have written papers in the past with GSOC students.

chilampoon commented 3 years ago

@cannin Got it. What are the expectations for this project to get integrated into the final publication? I do have some thoughts on it, I'll contact you later on since it's still in the organization application period now.

cannin commented 3 years ago

@chilampoon there are others involved in the project. i can only speak for myself. the code has to 1) add new features, 2) be tested, and 3) be documented.

patelaryan7751 commented 3 years ago

@cannin I had gone through the algorithm . Great Algorithm ! . I have some couple of questions in my mind. Presently in the algorithm there are 3 input data types (exp, cna, mut) what additional data types need to be added to the algorithm a brief info on it would do good ? what does the mapping in the goal of the project refer to some info on it ?

cannin commented 3 years ago

@patelaryan7751 apologies, i never saw this message. 1) it is not so much about what additional types. exp, cna, mut are actually dealt in 2 ways either discrete or continuous. it would be nice abstract the code so, for example, 5 data sets could me used (e.g., 3 discrete and 2 continuous). the data type biology details are unnecessary for a GSOC student. 2) the code generates similarities values between objects (i.e., cell lines and tumor samples); you could think of these objects as nodes in a graph and similarities between objects above a threshold (e.g., 0.5) to have an edge.

khanspers commented 3 years ago

This is an active GSoC 2021 project. The issue will be closed for the duration of GSoC since it is no longer available to other students.