ufal / perl-pmltq-web

Simple web build on the top of the PML Tree Query server
https://lindat.mff.cuni.cz/services/pmltq/
0 stars 0 forks source link

Permanent link is not fully permanent #162

Open martinpopel opened 3 years ago

martinpopel commented 3 years ago

https://lindat.mff.cuni.cz/services/pmltq has a "Permanent link" button which suggests the result of a given query will be always the same. Unfortunately, the results are sometimes returned in a different order. If random shuffling of the order is a feature, can we turn it off for permanent links?

matyaskopp commented 3 years ago

Permalink links keep pairs of treebanks and queries. It has nothing to do with the query result.

Results of queries with no filters are sets.

If you want the persistent filtered results, you can achieve that with sort. Otherwise, the result can be shuffled - I am not sure why - it can be caused by PostgreSQL or Perl (if the version is at least 5.18)

martinpopel commented 3 years ago

Thanks.

If you want the persistent filtered results, you can achieve that with sort.

How? I know I can sort the output filters (>> $A.id sort by $1), but how do I sort the results of a query without any filters? And shouldn't such stable sorting be inserted in all queries automatically to prevent the non-determinism of PostgreSQL or Perl? If the culprit is Perl, couldn't you set PERL_HASH_SEED and PERL_PERTURB_KEYS?

matyaskopp commented 3 years ago

Queries without output filters are not possible to sort. The result is a set and internally SQL uses LIMIT ... feature, so if you ask two times for a query result you can probably get two different subsets if PostgreSQL is really non-deterministic (I don't know, but I believe it is not - at least for the same installation of PostgreSQL).

If the culprit is Perl, couldn't you set PERL_HASH_SEED and PERL_PERTURB_KEYS?

It probably can help.

But basically when everybody uses persistent links one should be aware of what is linking: (treebank, query) Or more precisely: treebank and prefilled string in query field see: http://hdl.handle.net/11346/PMLTQ-AU61

martinpopel commented 3 years ago

I understand that permanent links specify just the treebank and query string. But when users see "permanent" they have some expectations. It is even trickier because sometimes the query returns the results in the same order (when trying to run the same query in the same browser with the same limit within a short time after the previous execution), so the users may think that also the result is permanent. For example, I have spent some time describing PDT-C errors such as "the sixth sentence found by this query is...".

So there are two possible solutions:

Queries without output filters are not possible to sort.

I am no PostgreSQL expert, so maybe there is an easier way to prevent the non-determinism, but what about adding ORDER BY to each PostgreSQL query? And using e.g. IDs of all nodes mentioned in the query? This should make the order deterministic.

If there is currently no guarantee about the ordering (and no way to specify the order within the query), no users should be disappointed by fixing the ordering.

Some users may appreciate random shuffling of the query results, but it should be optional and replicable (i.e. stable sort), such as in KonText.

choroba commented 3 years ago

I don't know how it all works now, but adding ORDER BY means you first need to get the whole result before you can use LIMIT and OFFSET. That would terribly slow down the service.

martinpopel commented 3 years ago

OK, so adding ORDER BY by default to all queries is not a good idea. But unless there is another way to prevent the non-determinism (which seems to be caused by PostgreSQL rather than Perl if a different set of 100 results is returned each time with LIMIT 100), it would be nice to at least have an option to turn on the ordering. When I know there are less than 100 (or 1000) results, the ordering would cause no significant slowdown (if I want all of them in the permanent query result anyway).

stranak commented 3 years ago

So maybe this is not so much about the "permanent link" button, but rather making it clear in the docs that the order of results is not guaranteed due to the database / technology in general?

The most I see we could do is make that information easy to reach.

martinpopel commented 3 years ago

Yes, it is not so much about the permanent link button - it would be nice to have deterministic order for the same query even if not using the button. But the as I wrote the users may think that also the result is permanent when they see "permanent link". So if we cannot fix the non-determinism bug, we should warn the users whenever they use the permanent link button.

stranak commented 3 years ago

“Permanent / persistent query link”?

9. 6. 2021 v 14:06, Martin Popel @.***>:

Yes, it is not so much about the permanent link button - it would be nice to have deterministic order for the same query even if not using the button. But the as I wrote the users may think that also the result is permanent when they see "permanent link". So if we cannot fix the non-determinism bug, we should warn the users whenever they use the permanent link button.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

martinpopel commented 3 years ago

Changing the name is not enough. My suggestion is to add the following notice (or similar) somewhere near the "Press ctrl + c to copy": "The permanent link encodes just the query and treebank. Ordering of the returned results may change."

stranak commented 3 years ago

You are still mixing 2 things and I am not convinced it is a good idea. The change in working makes it clear that the link is to the query, not the results. Another issue is how pmltq works. A side-effect of it is the impermanent ordering of results. I am not convinced we need to make a strong warning about that in every place that has to do with queries. After all, it seems we get this question about once in a decade.

Thus I propose to do 2 things:

Pavel

9. 6. 2021 v 15:01, Martin Popel @.***>:

Changing the name is not enough. My suggestion is to add the following notice (or similar) somewhere near the "Press ctrl + c to copy": "The permanent link encodes just the query and treebank. Ordering of the returned results may change."

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.