Closed vincerubinetti closed 4 years ago
@dongbohu can we figure out how to cleanly add participating genes in the signature query?
Perhaps just:
...
{
"id": 1,
"name": "Sig001pos",
"mlmodel": 1,
"genes": [
"PA4061",
"PA3601",
"PA2451",
...
]
}
...
@georgiadoing what gene info would you want to search signatures by? Standard name, systematic, description?
Standard name and systematic name would be great (eg. PAO1, dnaA or PA14_00010 if we are going to include PA14).
Each participation record also has a participation_type
field, so if we want to add participating genes in the signature query, the result will have to be something like this:
...
{
"id": 1,
"name": "Sig001pos",
"participations": [
{
"participation_type": "xxx",
"genes": [
"PA4061",
"PA3601",
"PA2451",
]
},
{
"participation_type": "yyy",
"genes": [
"PA4061",
"PA3601",
"PA2451",
]
}
}
...
And this query will be much more expensive, so I prefer not having the participation involved directly into Signature
API.
I don't think this is solveable from the frontend. I'd basically have to query the participating genes for all of the signatures, which would take forever. And so it would really make more sense to have that processing done on the background.
So if it's too large of a query to do on the backend, I don't think we can do it. @dongbohu maybe you could do a few tests to see how long such a query would take? And we can decide whether to close this based on the numbers?
If you first call the API api/v1/gene/?autocomplete=str
to get the matched genes, then call api/v1/participation/?related-genes=g1,g2,..
to get the participations, are the signatures in these participation records what you need?
I suppose that could work? That would give me all the signatures that have the gene you searched for in their participating genes? If that's the case though, could you just make that as part of the signature query, like autocomplete. It seems like something that would more appropriately be on the backend.
Also I don't think this would solve #134
@georgiadoing: I need some clarification on this issue:
systematic name
, such as PA3581
), PA14 name (such has PA14_17980
), and symbols (aka. standard name
, such as glpF
)? participations
record, right?PA
in the search box on the signature page, partial matches will include ALL Pseudomonas genes because all systematic names (aka. PAO1
name) of these genes start with PA
; in contrast, exact matches would return nothing, because no gene has a systematic/standard/synonyms of "PA" or "pa".
I think exact matches probably make more sense, because partial matches would include a lot more signature matches that you don't want and make the results both misleading and distracting. (Exact matches will be also much faster on backend than partial matches.)Please let me know what you think. Thanks.
Hi @dongbohu - For your first point - gene name, PAO1 number and PA14 number would all be great, but if it harder to do all of them, I would say I've just listed them in order of importance
second - yes, that is what I was imagining
third - I would also think exact match would be better
But also, if this is a difficult feature to implement, I think it is also fine, as a user, to search for signatures that contain genes from the gene page - I think that is how the tool is designed and it works really well that way. I think this feature would be a nice cherry on top, but users might not miss it :)
@georgiadoing: Thank you for your comments. I will work on it this week.
Since it is a backend issue, I am moving it to: https://github.com/greenelab/py3-adage-backend/issues/56, and closing this issue here.
From Georgia Doing:
Search signatures by gene names or numbers?