monarch-initiative / biolink-api

API for linked biological knowledge
https://api.monarchinitiative.org/api/
BSD 3-Clause "New" or "Revised" License
64 stars 25 forks source link

Sticky category filter #145

Open lhannest opened 6 years ago

lhannest commented 6 years ago

The category filter is sticky, and I assume others are as well. Maybe data is being re-ordered each time a query is run? For example if you do these queries in this order:

https://api.monarchinitiative.org/api/search/entity/diabetes?rows=1&start=1 This returns a disease https://api.monarchinitiative.org/api/search/entity/diabetes?rows=1&start=1&category=gene This appropriately returns a gene. https://api.monarchinitiative.org/api/search/entity/diabetes?rows=1&start=1 This now returns a gene. https://api.monarchinitiative.org/api/search/entity/diabetes?rows=1&start=1&category=disease This returns a disease https://api.monarchinitiative.org/api/search/entity/diabetes?rows=1&start=1 This now returns a disease.

This seems like a bug to me. The same query parameters should return the same data.

cmungall commented 6 years ago

hmm definitely undesirable. Can you try running the service locally and report what the solr calls are (on the log)

kshefchek commented 6 years ago

This reminds me that @tudorgroza emailed the same issue and I meant to turn it into a ticket, from Tudor:

" the search is for some reason 'stateful'. If I search for 'disease', all subsequent calls will return only disease, even if the category is not specified. If I change the category to 'gene', then again, all subsequent calls will return only genes. "

lhannest commented 6 years ago

http://localhost:5000/api/search/entity/diabetes?rows=1&start=1

2018-02-27 15:49:48,791 - root - INFO - Using pre-loaded object: <ontobio.config.Config object at 0x7f98d6bac2e8>
2018-02-27 15:49:48,791 - root - INFO - PARAMS={'qt': 'standard', 'rows': 1, 'hl.simple.pre': '<em class="hilite">', 'facet.field': ['category', 'taxon_label'], 'hl': 'on', 'fq': [], 'hl.snippets': '1000', 'start': 1, 'facet.mincount': 1, 'facet.limit': 25, 'qf': ['iri_std^3', 'iri_kw^3', 'iri_eng^3', 'synonym_std^2', 'synonym_kw^2', 'synonym_eng^2', 'label_std^2', 'label_kw^2', 'label_eng^2', 'id_std^3', 'id_kw^3', 'id_eng^3', 'definition_std^2', 'definition_kw^2', 'definition_eng^2'], 'facet': 'on', 'fl': '*,score', 'defType': 'edismax', 'q': 'diabetes'}
2018-02-27 15:49:49,054 - root - INFO - Docs found: 290
2018-02-27 15:49:49,055 - werkzeug - INFO - 127.0.0.1 - - [27/Feb/2018 15:49:49] "GET /api/search/entity/diabetes?rows=1&start=1 HTTP/1.1" 200 -

The entity returned is HP:0005978 with categoryPhenotype.

http://localhost:5000/api/search/entity/diabetes?rows=1&start=1&category=gene

2018-02-27 16:07:37,428 - root - INFO - Using pre-loaded object: <ontobio.config.Config object at 0x7f98d6bac2e8>
2018-02-27 16:07:37,428 - root - INFO - PARAMS={'qt': 'standard', 'rows': 1, 'hl.simple.pre': '<em class="hilite">', 'facet.field': ['category', 'taxon_label'], 'hl': 'on', 'fq': ['category:"gene"'], 'hl.snippets': '1000', 'start': 1, 'facet.mincount': 1, 'facet.limit': 25, 'qf': ['iri_std^3', 'iri_kw^3', 'iri_eng^3', 'synonym_std^2', 'synonym_kw^2', 'synonym_eng^2', 'label_std^2', 'label_kw^2', 'label_eng^2', 'id_std^3', 'id_kw^3', 'id_eng^3', 'definition_std^2', 'definition_kw^2', 'definition_eng^2'], 'facet': 'on', 'fl': '*,score', 'defType': 'edismax', 'q': 'diabetes'}
2018-02-27 16:07:37,662 - root - INFO - Docs found: 44
2018-02-27 16:07:37,663 - werkzeug - INFO - 127.0.0.1 - - [27/Feb/2018 16:07:37] "GET /api/search/entity/diabetes?rows=1&start=1&category=gene HTTP/1.1" 200 -

The entity returned is MGI:99415 with category gene.

http://localhost:5000/api/search/entity/diabetes?rows=1&start=1

2018-02-27 16:09:05,565 - root - INFO - Using pre-loaded object: <ontobio.config.Config object at 0x7f98d6bac2e8>
2018-02-27 16:09:05,566 - root - INFO - PARAMS={'qt': 'standard', 'rows': 1, 'hl.simple.pre': '<em class="hilite">', 'facet.field': ['category', 'taxon_label'], 'hl': 'on', 'fq': ['category:"gene"'], 'hl.snippets': '1000', 'start': 1, 'facet.mincount': 1, 'facet.limit': 25, 'qf': ['iri_std^3', 'iri_kw^3', 'iri_eng^3', 'synonym_std^2', 'synonym_kw^2', 'synonym_eng^2', 'label_std^2', 'label_kw^2', 'label_eng^2', 'id_std^3', 'id_kw^3', 'id_eng^3', 'definition_std^2', 'definition_kw^2', 'definition_eng^2'], 'facet': 'on', 'fl': '*,score', 'defType': 'edismax', 'q': 'diabetes'}
2018-02-27 16:09:05,851 - root - INFO - Docs found: 44
2018-02-27 16:09:05,852 - werkzeug - INFO - 127.0.0.1 - - [27/Feb/2018 16:09:05] "GET /api/search/entity/diabetes?rows=1&start=1 HTTP/1.1" 200 -

The entity returned is MGI:99415 with category gene.

http://localhost:5000/api/search/entity/diabetes?rows=1&start=1&category=disease

2018-02-27 16:10:42,990 - root - INFO - Using pre-loaded object: <ontobio.config.Config object at 0x7f98d6bac2e8>
2018-02-27 16:10:42,991 - root - INFO - PARAMS={'qt': 'standard', 'rows': 1, 'hl.simple.pre': '<em class="hilite">', 'facet.field': ['category', 'taxon_label'], 'hl': 'on', 'fq': ['category:"disease"'], 'hl.snippets': '1000', 'start': 1, 'facet.mincount': 1, 'facet.limit': 25, 'qf': ['iri_std^3', 'iri_kw^3', 'iri_eng^3', 'synonym_std^2', 'synonym_kw^2', 'synonym_eng^2', 'label_std^2', 'label_kw^2', 'label_eng^2', 'id_std^3', 'id_kw^3', 'id_eng^3', 'definition_std^2', 'definition_kw^2', 'definition_eng^2'], 'facet': 'on', 'fl': '*,score', 'defType': 'edismax', 'q': 'diabetes'}
2018-02-27 16:10:43,400 - root - INFO - Docs found: 217
2018-02-27 16:10:43,406 - werkzeug - INFO - 127.0.0.1 - - [27/Feb/2018 16:10:43] "GET /api/search/entity/diabetes?rows=1&start=1&category=disease HTTP/1.1" 200 -

The entity returned is MONDO:0005148 with category disease.

http://localhost:5000/api/search/entity/diabetes?rows=1&start=1

2018-02-27 16:12:07,989 - root - INFO - Using pre-loaded object: <ontobio.config.Config object at 0x7f98d6bac2e8>
2018-02-27 16:12:07,989 - root - INFO - PARAMS={'qt': 'standard', 'rows': 1, 'hl.simple.pre': '<em class="hilite">', 'facet.field': ['category', 'taxon_label'], 'hl': 'on', 'fq': ['category:"disease"'], 'hl.snippets': '1000', 'start': 1, 'facet.mincount': 1, 'facet.limit': 25, 'qf': ['iri_std^3', 'iri_kw^3', 'iri_eng^3', 'synonym_std^2', 'synonym_kw^2', 'synonym_eng^2', 'label_std^2', 'label_kw^2', 'label_eng^2', 'id_std^3', 'id_kw^3', 'id_eng^3', 'definition_std^2', 'definition_kw^2', 'definition_eng^2'], 'facet': 'on', 'fl': '*,score', 'defType': 'edismax', 'q': 'diabetes'}
2018-02-27 16:12:13,232 - root - INFO - Docs found: 217
2018-02-27 16:12:13,235 - werkzeug - INFO - 127.0.0.1 - - [27/Feb/2018 16:12:13] "GET /api/search/entity/diabetes?rows=1&start=1 HTTP/1.1" 200 -

The entity returned is MONDO:0005148 with category disease.

lhannest commented 6 years ago

Yeah, it looks like the filter query is persisting somehow.

lhannest commented 6 years ago

This is odd, I'm stepping through the code in entitysearch.py:

@ns.route('/entity/<term>')
@api.doc(params={'term': 'search string, e.g. shh, parkinson, femur'})
class SearchEntities(Resource):

    @api.expect(simple_parser)

    #@api.marshal_list_with(search_result)
    def get(self, term):
        """
        Returns list of matching concepts or entities using lexical search
        """
        import pudb; pudb.set_trace()
        args = simple_parser.parse_args()
        q = GolrSearchQuery(term,
                            **args)
        results = q.exec()
        return results

PuDB output:

>>> args
{'rows': 1, 'category': None, 'start': 1}
>>> term
'diabetes'

But when stepping into the constructor, PuDB output:

>>> fq
{'category': ['disease']}
cmungall commented 6 years ago

ok, this is at the ontobio level

>>> p = {'category':'disease'}
>>> q = GolrSearchQuery('diabetes', **p)
>>> results = q.exec()
>>> q.fq
{'category': 'disease'}
>>> p = {}
>>> q = GolrSearchQuery('diabetes', **p)
>>> results = q.exec()
>>> q.fq
{'category': 'disease'}
cmungall commented 6 years ago
class GolrSearchQuery(GolrAbstractQuery):
    """
    Queries over a search document
    """
    def __init__(self,
                 term=None,
[snip]
                  fq={},

I assumed this makes a fresh empty dict each time but apparently not?

lhannest commented 6 years ago

"Python’s default arguments are evaluated once when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you will and have mutated that object for all future calls to the function as well."

http://docs.python-guide.org/en/latest/writing/gotchas/

I wouldn't have expected that!

balhoff commented 6 years ago

😱

deepakunni3 commented 6 years ago

Today I Learned! 🐍