magnusmanske / petscan_rs

The repo for the PetScan tool
https://petscan.wmflabs.org/
GNU General Public License v3.0
45 stars 10 forks source link

Concern: text "Search" in other sources option may fail silenty if query too long. #93

Open afarlie opened 3 years ago

afarlie commented 3 years ago

Recently I was using the search field in the other sources box in an attempt to filter down a large results set.

For various reasons this appeared to only work if the category information was included in the text of the search. (This I am considering as by design.)

However, as the query length got very long (such as in https://petscan.wmflabs.org/?psid=18636955) , I started to notice that i was not apparently seeing results for situations where I was reasonably confident that there should be results.

ON Commons I tried the same 'search' directly to try and determine if my search text was incorrect and was informed that the text of the search query was too long. 500 characters vs a more typical 300. This makes me wonder if the search is returning with an error and no results, something which is not reported in Petscan.

Because no results are returned from the search (potentially due to the length being exceeded), Petscan has no list of files from the 'search' query to compare against a lists of pages generated by doing category include/exclude. Hence it returns 0 results when the 2 lists are compared.

My understanding was you had now implemented a separate Search filter (Issue 84), but was not yet seeing this in the live version of the tool.

magnusmanske commented 3 years ago

The query takes so long because there are many large "negative" categories, which is hard to optimize in the SQL queries.

magnusmanske commented 3 years ago

The "search filter" is on the output page, however, it will not work for that many (>820k) results, as each result has to be searched individually.

magnusmanske commented 3 years ago

A search query that is "too long" would fail with an appropriate error, AFAICT