greenelab / adage-frontend

The Adage web app, a tool to explore gene expression data and discover new insights from machine learning models
https://adage.greenelab.com
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

Search signatures by gene names or numbers? #142

Closed vincerubinetti closed 4 years ago

vincerubinetti commented 4 years ago

From Georgia Doing:

Search signatures by gene names or numbers?

vincerubinetti commented 4 years ago

@dongbohu can we figure out how to cleanly add participating genes in the signature query?

Perhaps just:

...
{
  "id": 1,
  "name": "Sig001pos",
  "mlmodel": 1,
  "genes": [
    "PA4061",
    "PA3601",
    "PA2451",
    ...
  ]
}
...

@georgiadoing what gene info would you want to search signatures by? Standard name, systematic, description?

georgiadoing commented 4 years ago

Standard name and systematic name would be great (eg. PAO1, dnaA or PA14_00010 if we are going to include PA14).

dongbohu commented 4 years ago

Each participation record also has a participation_type field, so if we want to add participating genes in the signature query, the result will have to be something like this:

...
{
  "id": 1,
  "name": "Sig001pos",
  "participations": [
    {
       "participation_type": "xxx",
       "genes": [
          "PA4061",
          "PA3601",
          "PA2451",
      ]
   },
   {    
       "participation_type": "yyy",
       "genes": [
          "PA4061",
          "PA3601",
          "PA2451",
      ]
  }
}
...

And this query will be much more expensive, so I prefer not having the participation involved directly into Signature API.

vincerubinetti commented 4 years ago

I don't think this is solveable from the frontend. I'd basically have to query the participating genes for all of the signatures, which would take forever. And so it would really make more sense to have that processing done on the background.

So if it's too large of a query to do on the backend, I don't think we can do it. @dongbohu maybe you could do a few tests to see how long such a query would take? And we can decide whether to close this based on the numbers?

dongbohu commented 4 years ago

If you first call the API api/v1/gene/?autocomplete=str to get the matched genes, then call api/v1/participation/?related-genes=g1,g2,.. to get the participations, are the signatures in these participation records what you need?

vincerubinetti commented 4 years ago

I suppose that could work? That would give me all the signatures that have the gene you searched for in their participating genes? If that's the case though, could you just make that as part of the signature query, like autocomplete. It seems like something that would more appropriately be on the backend.

Also I don't think this would solve #134

dongbohu commented 4 years ago

@georgiadoing: I need some clarification on this issue:

Please let me know what you think. Thanks.

georgiadoing commented 4 years ago

Hi @dongbohu - For your first point - gene name, PAO1 number and PA14 number would all be great, but if it harder to do all of them, I would say I've just listed them in order of importance

second - yes, that is what I was imagining

third - I would also think exact match would be better

But also, if this is a difficult feature to implement, I think it is also fine, as a user, to search for signatures that contain genes from the gene page - I think that is how the tool is designed and it works really well that way. I think this feature would be a nice cherry on top, but users might not miss it :)

dongbohu commented 4 years ago

@georgiadoing: Thank you for your comments. I will work on it this week.

dongbohu commented 4 years ago

Since it is a backend issue, I am moving it to: https://github.com/greenelab/py3-adage-backend/issues/56, and closing this issue here.