pombase / website

PomBase website v2
MIT License
6 stars 1 forks source link

Re-implement the query builder #99

Closed ValWood closed 7 years ago

ValWood commented 7 years ago

This is all I want:

New QUERY PAGE (with additional fields when required for relations conditions, allele type etc)

new_query

ValWood commented 7 years ago

And

RESULTS PAGE:

results

ValWood commented 7 years ago

And a download page with all the options https://github.com/pombase/website/issues/20

ValWood commented 7 years ago

reopened original ticket https://github.com/pombase/website/issues/57

ValWood commented 7 years ago
ValWood commented 7 years ago

We will have a set of canned queried

ValWood commented 7 years ago
Antonialock commented 7 years ago

A mockup of having it all on one page. If lists are long we could have embedded scrollable tables? Also Kim mentioned "show all" clicky

slide1

kimrutherford commented 7 years ago

I think for the first attempt at the query builder I'll use the Canto web service to do the auto-complete of term terms.

Advantages:

Disadvantages

ValWood commented 7 years ago

that's not a problem. This can be an admin only option (the communiyt have never made these annotations we do. Also new annotations will nearly alwasy require a CV update at present. So that will all be fine.

more advantages: Canto autocomplete works better than existing PomBase autocomplete anyway.....

kimrutherford commented 7 years ago

I've thought of another possible drawback of using Canto for autocompletion: at the moment the completion works on one ontology at a time. So we'd have to have separate MF, BP and CC autocomplete boxes.

In PomBase v1 you can search all of GO from one input field.

ValWood commented 7 years ago

hmm good point. Is it tricky to add another option to search everything, even if this option isn't available directly in Canto?

kimrutherford commented 7 years ago

Is it tricky to add another option to search everything, even if this option isn't available directly in Canto?

Turns out that it's all fine. I forgot that the ability to query multiple ontologies at once was implemented to get the extensions interface to work. So it can so this already but the URL syntax is slightly different to what I remember.

Note to self, here's an example: https://curation.pombase.org/pombe/ws/lookup/ontology/[GO:0005575|GO:0003674|GO:0008150]?def=1&term=DNA%20helicase

kimrutherford commented 7 years ago

The ontology auto-complete part of the new query builder is up and sort-of working. But only at this URL: http://pombase2.bioinformatics.nz/query It is likely to break at times over the next few weeks as I work on things.

It's not on the preview site because I don't want to break anything there.

kimrutherford commented 7 years ago

I've added a very simple history section to the query builder. You can intersect and union queries if you have two or more.

It's very basic but I wanted to make sure everything was wired up correctly. I can't guarantee that the results are correct.

http://pombase2.bioinformatics.nz/query

ValWood commented 7 years ago

Nice, seems to work fine.

I think the phenotype filter should only be single-gene alleles by default (because of the way the query builder is normally used the expectation at present would be that any gene in the list would have a cell cycle phenotype). We need to add a way to deal with multi allele queries later. We never got this far on the old site....

kimrutherford commented 7 years ago

I think the phenotype filter should only be single-gene alleles by default

Just to be clear, do you mean genes from single allele genotypes or genes from single gene genotypes?

kimrutherford commented 7 years ago

do you mean genes from single allele genotypes or genes from single gene genotypes?

I've change it to return genes from single allele genotypes, which should match the other pages.

For example if you query "abolished actomyosin contractile ring assembly" you get the same genes in the single allele genotype section of the term page: http://preview.pombase.org/term/FYPO%3A0001009

kimrutherford commented 7 years ago

SO -> current set is only protein features

What is the top level SO term for protein features?

Is it: polypeptide - SO:0000104

kimrutherford commented 7 years ago

Second guess: polypeptide_region - SO:0000839 :-)

ValWood commented 7 years ago

I've change it to return genes from single allele genotypes, which should match the other pages.

yes that's correct. We may want to include multiple alleles of single genes, but at present these are classed as multi-allele. I thought there was a ticket to discuss this (fairly edge case), but I can't find it..

and yes polypeptide_region - SO:0000839 will prevent it being confused with a feature type (CDS)

ValWood commented 7 years ago

For example if you query "abolished actomyosin contractile ring assembly" you get the same genes in the single allele genotype section of the term page: http://preview.pombase.org/term/FYPO%3A0001009

great. Now on second thoughts, maybe we want to include all and have "single-allele" "multi-allele" as filter options?

https://github.com/pombase/website/issues/405

kimrutherford commented 7 years ago

Now on second thoughts, maybe we want to include all and have "single-allele" "multi-allele" as filter options?

I made a separate issue for that: https://github.com/pombase/website/issues/405

kimrutherford commented 7 years ago

I've added Protein modification and Protein feature to the query builder.

For now it's still using the Canto auto-completion. There's a problem with that that I hadn't thought of: the completion searches all terms from the given ontology including those that have no annotation. Will that be annoying to users who don't know the data well? (It was annoying when I was testing the Protein feature completion because I kept selecting terms with no annotation)

I've also changed how it looks to experiment with avoiding a drop-down menu (I don't like them). We can talk about it next time we chat.

ValWood commented 7 years ago

Great, all seems to be working.

There is a minor issue you are probably ware of: The search box overlaps the results list.

fix

There's a problem with that that I hadn't thought of: the completion searches all terms from the given ontology including those that have no annotation.

Yes, that could be fixed later, but I think its a really nice feature that the current search only finds terms with extant annotations. I guess that means creating an ontology subset for searching....is that a pain?

I've also changed how it looks to experiment with avoiding a drop-down menu (I don't like them). We can talk about it next time we chat.

I like being able to see all the available queries... I don't know how that will pan out when all are present...maybe a hybrid approach will work.... can chat about this tomorrow...

kimrutherford commented 7 years ago

There is a minor issue you are probably ware of: The search box overlaps the results list.

That's a bit better now but still needs work.

I guess that means creating an ontology subset for searching....is that a pain?

I would be no fun to do in Canto because would mean that it would have to read all annotation from Chado to build a subset.

Instead we should be able to do it in the back-end code the runs the new website. GO use Solr to do their searching and indexing but it's an extra (and large) moving part that I'm hoping to avoid.

I'll do some investigation.

kimrutherford commented 7 years ago

I've added a very simple gene list filter for testing. It only works with systematic IDs and it breaks with other IDs.

ValWood commented 7 years ago

Super. I'll give this a spin later.

As part of the "unknowns" paper we are writing, I will soon (not this week!), need to do the following queries on the new data (my plan was to get you to run the SQL if the query builder wasn't done but we will be able to wait)

I'm doing the analysis using the old site, and then I'll plug the new query results in at the end.

I will need to be able to do

I will add others to the list to prioritise as I become aware that I will need them.

to continue with https://github.com/pombase/curation/issues/1540

kimrutherford commented 7 years ago

"subtraction queries"

I've sort of done that - you'll need to reload. I say sort-of because you have to have the two queries to subtract in the right order in your history. Let's chat about how to fix that.

ValWood commented 7 years ago

and

mah11 commented 7 years ago

Instead of "genes by type" (which could be a type of anything) -> genes encoding product type (we wanted a different label, but not sure what @mah11 !)

anything wrong with simply "product type"?

ValWood commented 7 years ago

"product type" sounds ok...

kimrutherford commented 7 years ago

conserved in....(call this taxon distribution)

I've done that now. Any small ontology that needs to appear in a drop down can now be implemented by just changing the configuration.

kimrutherford commented 7 years ago

"annotation status" queries to access the "conserved unknowns"

That's done now, but very basic.

protein domains (Interpro IDs only for release as the rest is a bit more complicated)

I've added options for searching:

It's also very basic. We need to chat about help text, formatting and more.

kimrutherford commented 7 years ago

All domain IDs

Currently you can put in an ID like "PANTHER:PTHR23502:SF80" or "PANTHER:PTHR28051" or you can leave off the prefix "PTHR23502:SF80" or "PTHR28051". Is there any problem with that that I haven't thought of?

ValWood commented 7 years ago

On 05/07/2017 00:59, Kim Rutherford wrote:

conserved in....(call this taxon distribution)

I've done that now. Any small ontology that needs to appear in a drop down can now be implemented by just changing the configuration.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pombase/website/issues/99#issuecomment-312968263, or mute the thread https://github.com/notifications/unsubscribe-auth/AHBLKFTS8VkIJ_0LW-ww7kIGfXw5RVKOks5sKtHOgaJpZM4K_AUC.

that's handy!

-- University of Cambridge PomBase http://www.pombase.org/ Cambridge Systems Biology Centre http://www.sysbiol.cam.ac.uk/Investigators/val-wood

ValWood commented 7 years ago

Currently you can put in an ID like "PANTHER:PTHR23502:SF80" or "PANTHER:PTHR28051" or you can leave off the prefix "PTHR23502:SF80" or "PTHR28051". Is there any problem with that that I haven't thought of?

I can;t think of one, there should not be any overlap. Nice.

ValWood commented 7 years ago

I think everything in this ticket is now in other tickets.

kimrutherford commented 7 years ago

The query builder is now available in preview.pombase.org