nrnb / GoogleSummerOfCode

Main documentation site for NRNB GSoC project ideas and resources
115 stars 38 forks source link

Advanced search on WikiPathways #32

Closed mkutmon closed 4 years ago

mkutmon commented 8 years ago

Background

WikiPathways is a collaborative pathway database that is build on Mediawiki. Currently, the site offers a basic search function that relies on an index of keywords from the title and description of a pathway, plus a list of pathway elements, such as gene, proteins and metabolites. Perform test searches for topic words like "cancer" or genes like "p53" to see the result space. Pathways, however, contain other information, including structured ontology terms (pathway, cell type and disease ontologies from NCBO BioPortal) that could be used for better, more advanced searches. Our lucene indexer already indexes a lot of this structured information but it is not visible on the website at the moment. Pathways also have information about interactions, cellular compartments, etc. This additional information should be searchable in a structured way where possible.

Goal

The goal is to implement a new, advanced search function. It should provide an interface to search for pathways that are annotated with specific ontology terms or pathways that contain a specific interactions or compartments. The advanced search should also allow users to filter results based on species, curation tags, authors, creation date, number of edits, etc. We would like student applicants to familiarise themselves with WikiPathways and the atomized units of pathway information, and then propose how to extract and index this information, as well as how best to provide an interactive view of search results.

Difficulty Level 1

A lot of the features are already in place on WikiPathways and this project would connect and link them on a user-friendly advanced search page. Additional search methods will be implemented using the existing lucene indexer code.

Skills

PHP (essential) Lucene Indexer, mediawiki, Java (nice to have)

Public Repository

https://github.com/wikipathways/wikipathways.org

Potential Mentors

Martina Kutmon

Contact

Martina Kutmon

sharan1 commented 7 years ago

Hello everyone Although i haven't used WikiPathways before, i have been working on web technologies for quite sometime. I have a sound knowledge of HTML, JS , jQuery, PHP, SQL etc. I do have have a few queries in mind

What steps should i take in order to be a strong candidate? Also, Is it too late to be considered for this Project?

Thank you

mkutmon commented 7 years ago

I think it should still be enough time to write a good proposal. As mentioned in the project idea as well, it is a good idea to investigate the WikiPathways database, so you have a good feeling what kind of data is stored in the database and how users might want to use an advanced search feature. You should also have a look at the code that is generating the lucene index so see which information about the pathways is already indexed and therefore searchable. The new interface should make it easy and intuitive for the biologists to query the database (besides using a simple keyword search).

Feel free to contact me by email if you have any questions.

sharan1 commented 7 years ago

Thanks Martina. I had sent you an email yesterday on the given email id regarding my understanding of the project. I will just copy paste it here. " I have worked on search before using PHP Yii framework and SOLR.

As you have mentioned that this project should help biologists to query the database easily, so maybe we need to a create common dashboards combining multiple tables with a search option to each column. I have attached an example for the same from one of my previous projects where I have used something similar like this. In the attachment the text boxes refers to search option for that particular column.

Kindly let me know if I am thinking in the right direction.

If it is a sitewide search option then, it would of-course be different."

multi_search
AlexanderPico commented 7 years ago

Interesting idea! I think something like this could possibly work for the original problem posted above, namely, using the ontology information already associated with pathways. For example, the categorical "Type" field could be populated with just high-level terms from a given ontology. That would probably be sufficient and cover most use cases.

@sharan1, it might help if you were to browse WikiPathways a bit and see examples of the current search fields/filters, plus some of the new information we'd like to incorporate (like ontology terms) and then provide a mockup with columns that reflect this content. It might help to communicate (and convince) mentors that this approach could work.