nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
115 stars 39 forks source link

Implement pySBOL3 validation rules #182

Closed jakebeal closed 8 months ago

jakebeal commented 2 years ago

Background

pySBOL3 is one of the main libraries implementing the SBOL3 standard. The SBOL3 standard has a set of rules for checking the validity of documents, but these are currently only partially implemented. The rule implementation is spread between pySBOL3 itself and the SHACL rules contained in sbol-shacl

Goal

The result of this project would be to implement most or all of the validation rules that have not yet been implemented.

Difficulty Level: Easy

The rules are very well defined and can be implemented incrementally.

Size and Length of Project

Skills

Essential skills: Python Will be learned if not known: RDF, SBOL, SHACL

Public Repository

pySBOL3

Potential Mentors

jakebeal@ieee.org, @tcmitchell, Bryan.A.Bartley@raytheon.com

Omarelsherif010 commented 2 years ago

Hi @jakebeal, @tcmitchell, I am Omar Elsherif and I am a medical informatics third-year undergraduate student. I have good experience using python for two years. I learned data analysis from Udacity and built some projects which you can find here. I am interested in machine learning, and deep learning so I learned them during the last two years from DataCamp and joined Neuromatch Academy as an interactive student in which we used python so much. I am working as a research assistant on a research paper in computational neuroscience and we are about to publish our work soon.

I think I am good with python and I feel easy going through different python libraries to do specific tasks, but I have no experience with RDF, SBOL, and SHACL. So please if you can, recommend some resources to start with in order to start working on this project for GSOC 22. Thanks for your time

jakebeal commented 2 years ago

@Omarelsherif010 A good starting point for familiarization with this material is the SBOL tutorial material on the data model and Python library that was presented at IWBDA 2021.

Omarelsherif010 commented 2 years ago

Thanks for sharing this @jakebea, I am working on the tutorials

Omarelsherif010 commented 2 years ago

Hi @jakebeal, I have read SBOL tutorials you recommended and successfully build components and add them to a document as the tutorial asked. Also, I have watched these lectures on YouTube: Programming biology – controlling the flow of molecular machines empowering life (IWBDA 2021), SBOL Visual: Diagrams for Synthetic Biology (COMBINE 2020) and Webinar for SBOL Developers . Now I have a good understanding of SBOL and the development of its data model from SBOL to SBOL2 and finally SBOL3. Besides that I have read this paper: The Synthetic Biology Open Language (SBOL) Version 3: Simplified Data Exchange for Bioengineering.

I have found this workshop playlist: IWBDA 2021. Do you recommend starting watching it or do you have other suggestions that will help me improve faster and be ready for our project?

Thanks for your time and help. I am looking forward to hearing from you.

jakebeal commented 2 years ago

@Omarelsherif010 If you would like to become fluent with the code base, I suggest jumping in and starting to use the code. One great way to do that would be to take a look at some of the issues on pySBOL3 that are marked as "good first issue" and see if you can make contributions to resolve them.

Omarelsherif010 commented 2 years ago

Thanks a lot. I will definitely do that.

khanspers commented 2 years ago

NRNB has officially been accepted as a mentoring organization for GSoC 2022! Here are some useful links:

Omarelsherif010 commented 2 years ago

That is great news. I will start working on my proposal for this project as soon as possible. I am really excited to start the process.

tcmitchell commented 2 years ago

@Omarelsherif010 please let us know if you need additional info for your proposal.

Omarelsherif010 commented 2 years ago

Hi @tcmitchell I have been working on the proposal for a while but had some urgent stuff in my college and that made a little delay I will continue working on the proposal and hope to have a meeting with you next week if available or share my draft via email. And thanks for your great help. Really grateful

tcmitchell commented 2 years ago

Here are some links from the GSoC Mentors mailing list that might be generally helpful to all who are interested in this project:

khanspers commented 2 years ago

A reminder that the application period opens on Monday April 4. Proposals to NRNB must be submitted on the official GSoC Site (https://summerofcode.withgoogle.com/) before April 19, 18:00 UTC to be considered, and contributors are encouraged to submit proposals in draft format early, so that mentors can give feedback directly at the GSoC site.

AlexanderPico commented 2 years ago

IMPORTANT REMINDER: GSoC 2022 is for new “beginners” to open source.

Applicants are expected to review eligibility requirements prior to applying. We can not accept applications from contributors with prior open source development experience. From the GSoC FAQ https://developers.google.com/open-source/gsoc/faq:

Can someone already participating in open source be a GSoC Contributor?

The goal of GSoC is to bring new contributors into open source organizations. GSoC can also help beginner contributors learn the ins and outs of open source while being mentored by experienced community members. GSoC is for new and beginner contributors to open source, it is not for experienced contributors to open source.

khanspers commented 1 year ago

Closing in preparation for GSoC 2023.

jakebeal commented 1 year ago

Project is still valid and needed: reopening for 2023

Omarelsherif010 commented 1 year ago

Hi @jakebeal

I hope you are doing well. I am still interested to participate in this project and implement validation rules for SBOL documents. I have good experience in python and unit testing. I also used pySBOL3 during the last few months to create SBOL3 files in order to test the DNAplotlib version 2 data structure and visualize its components.

Kindly let me the current project state so I can go ahead and create a good draft plan that we can edit together. I am so excited that this project is still available to participate and I want to start contributing as soon as possible.

jakebeal commented 1 year ago

Glad to hear that you are interested, @Omarelsherif010 . The project is still in much the same state it was a year ago: all of the rules exist in the specification, the framework for adding them to the library is in place, and we just need somebody to do the translation from specification to python.

Omarelsherif010 commented 1 year ago

Great so I need to read the 'Validation Rules' section on SBOL3 Specifications as a first step.

Where can I find the already implemented rules in pySBOL3? And what do you mean by framework for adding them to the library (tool to use?)

If it is a tool I should use, kindly share a link with me.

I am ready to start implementing the rules in early Feb, so for now I need to understand more about where to look and what is expected as a result so I can show an example of my implementation and edit based on your recommendation.

I can't wait for GSoC to start officially, let's implement as much as possible now

jakebeal commented 1 year ago

@Omarelsherif010 Yes, that section of the specification is what needs to be fully implemented. We want to be able to handle all of the "check-box", "circle", and "star" rules.

The first set (sbol3-101*) are mostly implemented via special case code of various sorts: they are not the primary focus of this project, and might all be done already.

The others are to be implemented via the "validate" function on the appropriate class. Here are two examples:

We will also want to create tests that make sure that all of the rules will correctly catch failures.

A good starting point would be to start working through some of the current tests to make sure that they are actually exercising all of the validation rules that we think they should be. For example, we test detection of invalid sequences, but don't check if it gives an error for both of the cases.

khanspers commented 1 year ago

@Omarelsherif010 : Please note that we (NRNB) are still in the process of applying as a mentoring organization; mentoring organizations will be announced on Feb 22. Check back after that date for more information.

Omarelsherif010 commented 1 year ago

Thanks @khanspers I know the process and timeline of GSoC and I am very excited to work on this project so I will start contributing in early Feb

Omarelsherif010 commented 1 year ago

@jakebeal I have checked the links you provided and yeah it is clear now to me. I will start playing with existing tests and then create a test case for one rule and finally implement the validation rule to make the test pass. I will let you know once I create the test case

tcmitchell commented 1 year ago

@Omarelsherif010 if you have any questions, please open an issue on the pySBOL3 repository.

Vikash-8090-Yadav commented 1 year ago

@tcmitchell @jakebeal Vikash here , junior at Chandigarh University. I found this org interesting as this matches with my skill . Looking forward to work with this org . please guide me with the initial steps to be followed while contributing ! Please provide any medium so that i can contact u to ask query .

tcmitchell commented 1 year ago

@Vikash-8090-Yadav : Please note that we (NRNB) are still in the process of applying as a GSoC mentoring organization; mentoring organizations will be announced on Feb 22. Check back after that date for more information.

Please read the comments above for initial steps to be followed while contributing. This GitHub issue is the medium to contact us with any questions about this GSoC project.

Omarelsherif010 commented 1 year ago

Hi @tcmitchell and @jakebeal

I have gone through some of the implemented tests and noticed the feature class validation function doesn't have a unit test, so I opened this issue and am willing to create that one.

If you recommend something else to do, I will be happy to follow your suggestion.

khanspers commented 1 year ago

NRNB has been accepted as a mentoring organization for GSoC 2023! Contributor applications open on March 20. Here are some useful links:

GSoC contributor guide NRNB project proposal template Eligibility requirements Full program timeline

Omarelsherif010 commented 1 year ago

What awesome news, I have finished the code of the first unit test of the rules and will open a PR soon. I think I will implement couples and see Tom and Jake's reviews, then start the proposal.

sumit-158 commented 1 year ago

Hi, I have submitted the GSoC proposal for this issue. Please provide some feedback. Thank you!

Proposal google Doc link Also added potential mentors as commenters in the doc so you can provide feedback directly in the doc.

tcmitchell commented 1 year ago

For proposers:

We generally prefer an iterative, test-first development model. For this project that means that we would want to merge in new validation rules and their accompanying tests as they are ready. This probably means at least one pull request per week of validation rules and unit tests. We would not want a single pull request at the end of the GSoC period containing all validation rules. A good way of organizing is class by class, similar to the way the validation rules are organized in the SBOL 3.1.0 specification, Appendix B.

It would be nice to have unit tests that verify each SBOL3 validation rule, even rules that are already implemented. These unit tests should verify that a valid object passes and that an invalid object fails.

Caution: some of the validation rules are challenging because they require navigating the structure around a given object. It may be wiser to add issues to complete these later instead of getting stuck on these for a long time.

Alerting everyone who has expressed interest in 2023: @sumit-158 @Omarelsherif010 @Vikash-8090-Yadav

Omarelsherif010 commented 1 year ago

Hi @tcmitchell @jakebeal @bbartley

I have finished my Proposal and added my PRs and detailed timeline plan. Could you please give a final review before submitting? Also, I need your review on this PR and this issue in order to get more into details of pySBOL3 code.

Thanks in advance

I am enjoying the process of getting more in depth with SBOL world step by step, You are really great mentors❤️

Omarelsherif010 commented 1 year ago

What do you recommend I do until GSoC 23 results? Should I try to collect validation rules that have not been implemented yet or try to solve more issues inside the library?

Or do you have better suggestions?

jakebeal commented 1 year ago

@Omarelsherif010 It is up to you what you choose to do, but the GSoC process is already in motion at this point.

princyym commented 8 months ago

Hello, I am Princy Mangla wants to contribute to this project, I have a very good command in Python and eager to learn sbol library and another necessary lib further, could you please assign this issue and explain its technical issues.

jakebeal commented 8 months ago

While this project still needs to be done, we have decided that we are not in a good position to supervise a GSoC student on it in summer 2024.