Closed jakebeal closed 5 months ago
Hi @jakebeal @tcmitchell @VishweshGitHub , I am Yash Gupta , a third year undergraduate. I am looking forward to contribute to this project for GSOC '22 . I have an year's worth of experience in Python . Being new to this project , any guidance on where and how to start would be very helpful.
Hi @jakebeal , I am Kartik Kumar Pawar, a CSE sophomore at BITS PILANI. I have good experience using python for about 6 years.I am also adept in JAVA with knowledge of OOPS and basic design patterns,I have also worked with both SQL and NoSQL database systems.I am familiar with javascript and have worked with React and nodeJS as well. I am really excited to know more about this project and contribute to it, with the aim of becoming a GSOC 22 contributor as well. I kindly request you to guide me for the same so I can start as soon as possible.
@Yash-g17 @Kartikkp07 If you'd like to learn more about the project and start familiarizing yourself with material, a good starting point is the SBOL tutorial material on the data model and Python library that was presented at IWBDA 2021.
Hi @jakebeal, I am Aakash currently pursuing a dual degree on biological sciences at Indian Institute of Technology Madras(IITM). I have been coding on python for about an year now. I know basics of snap genes as well. I'd like to work on this project. Kindly guide me on how to get started.
@Aakash-02 Application for support on the project goes through the standard Google Summer of Code process. If you'd like to learn more about the project and start familiarizing yourself with material, please see the comment above yours.
Hi @jakebeal @tcmitchell @VishweshGitHub I am Ahmed Tarek and I am a medical informatics 3rd-year undergraduate student. I have good experience using python for two years. I am interested in machine learning, and deep learning so I joined Neuromatch Academy as an interactive student in which we used Pytorch. I am working as a research assistant on a research paper in NLP and we are about to publish our work soon.
I took a Genetics course at college and did a project using some ML libraries, Biopython, Py3Dmol, and nglview which you can find here. I used Biopython in this project to deal with fasta files and read them, translate and transcribe the sequence, then analyze protein sequence and compare between each gene. I used PDB id for each gene to visualize it using Py3Dmol and nglview.
I'll start studying from the resources you attached above about SBOL (the SBOL tutorial material on the data model and Python library that was presented at IWBDA 2021) to start working on this project for GSOC 22.
Thanks for your time
NRNB has officially been accepted as a mentoring organization for GSoC 2022! Here are some useful links:
Here are some links from the GSoC Mentors mailing list that might be generally helpful to all who are interested in this project:
A reminder that the application period opens on Monday April 4. Proposals to NRNB must be submitted on the official GSoC Site (https://summerofcode.withgoogle.com/) before April 19, 18:00 UTC to be considered, and contributors are encouraged to submit proposals in draft format early, so that mentors can give feedback directly at the GSoC site.
IMPORTANT REMINDER: GSoC 2022 is for new “beginners” to open source.
Applicants are expected to review eligibility requirements prior to applying. We can not accept applications from contributors with prior open source development experience. From the GSoC FAQ https://developers.google.com/open-source/gsoc/faq:
Can someone already participating in open source be a GSoC Contributor?
The goal of GSoC is to bring new contributors into open source organizations. GSoC can also help beginner contributors learn the ins and outs of open source while being mentored by experienced community members. GSoC is for new and beginner contributors to open source, it is not for experienced contributors to open source.
Closing in preparation for GSoC 2023.
Project is still valid and needed: reopening for 2023
Hello there @jakebeal , I'm Harsh Rathi,CSE sophomore writing to express my interest in contributing to this SBOL project as a part of the Google Summer of Code program . Apart from the good first issues in the SBOL-Utilitites ,please guide me on how to familiarize myself with the requirements of this project .
@HarshRathi2511 I would suggest starting by testing out the .dna to GenBank converters in the open source tools linked above. Once you've been able to run them, the next step would be to find the code in those tools that converts .dna to GenBank and get to know it, as a key part of this project will be to make a converter that goes in the other direction.
@jakebeal Sure I'll test around the .dna to GenBank converters and have a look at their code.
Hello, I am trying to work on this but I am unable to find .dna files. Where can I find the files?. Because it will be easier to parse the file and then figure out how to convert it into a different file.
@Foxtrot-14 thanks for your interest. Please start by reading the description and testing out the open source .dna to GenBank converters linked therein. At least one contains sample .dna files. Once you've been able to run them, the next step would be to find the code in those tools that converts .dna to GenBank and get to know it, as a key part of this project will be to make a converter that goes in the other direction.
Ok, I have cloned the SnapGeneReader project and tested it with the sample .dna files, there are two functions in the same:
snapgene_file_to_dict()
returns an object of type <class 'dict'>
snapgene_file_to_seqrecord()
returns an object of type <class 'Bio.SeqRecord.SeqRecord'>
snapgene_file_to_gbk()
to convert the objects into the GenBank format.
should the next step be to figure out a way to convert this file into SBOL...? Hi @Foxtrot-14, sorry for the delay in responding. The goal is two-way conversion: from snapgene to SBOL, and from several formats to snapgene. See the first paragraph under the heading "Goal" in the description of this issue where the goal is spelled out in more detail.
You'll have to figure out which of the 3 functions you list provide the necessary details for the conversion to SBOL. I am not familiar with snapgene so I cannot comment on which provide the necessary data and which would be the best to work from. That's part of this project.
Thanks for you continued interest!
Hi @tcmitchell @jakebeal @VishweshGitHub
I am interested in this problem. I have utilized the SnapGeneReader repo to convert .dna files to a dictionary and GenBank file after making the required changes to the repo. My contributions, including the necessary modifications, are available for review in the pull requests I submitted to the SnapGeneReader repository:
For converting .dna files into SBOL format, I've conceptualized two strategies: i) Convert .dna file to python dictionary (using SnapGeneReader) -> then convert to SBOL ii) Convert .dna file to GenBank file (using SnapGeneReader -> then convert to SBOL (using https://github.com/nrnb/GoogleSummerOfCode/issues/183 by @mohitdmak)
I am currently working on the second approach.
I am keen on taking up this project for GSOC 2024. I am currently a masters student in Computer Science at San Jose State University. Additionally, my prior experience at the MeDAL (Medical Deep Learning and Artificial Intelligence Lab) at IIT Bombay, India's leading research institute, has equipped me with relevant skills. At MeDAL Lab I had worked on developing a module to perform instance segmentation and classification of nuclei in Multi-Tissue Histology WSIs by scaling a python based codebase.
I eagerly anticipate your feedback and am looking forward to the opportunity of working under your mentorship. Thank you for considering my application.
While this project still needs to be done, we have decided that we are not in a good position to supervise a GSoC student on it this summer.
Background
SnapGene is a popular DNA design tool, but uses a custom
.dna
file format. There are two open software tools for reading .dna files into GenBank format, BioPython and SnapGeneReader. The connection onward from GenBank format to SBOL has not been tested for lossiness, however, and there is no open tool for writing .dna files.Goal
This project will add the ability to convert from SnapGene .dna files to SBOL3 files and from any of GenBank, FASTA, or SBOL to SnapGene .dna format. This will be implemented as a writing extension for SnapGeneReader or BioPython and as an extension to the
sbol-converter
utility in SBOL utilities.Correctness will be validated by round-tripping (import, then export) at least the
Component
andSequence
objects in SBOL3 files from the SBOL test suite and by checking that imported materials can be sensibly viewed in the free SnapGene viewer.Difficulty Level: Medium
While the overall goals of the project are relatively straightforward, it will require figuring out how to work with SnapGene's poorly documented .dna format.
Size and Length of Project
Skills
Essential skills: Python Will be learned if not known: SBOL
Public Repository
https://github.com/Edinburgh-Genome-Foundry/SnapGeneReader or https://github.com/biopython/biopython https://github.com/SynBioDex/SBOL-utilities
Potential Mentors
Vishwesh V Kulkarni vvk215@gmail.com, @tcmitchell