Open martinpopel opened 3 years ago
Permalink links keep pairs of treebanks and queries. It has nothing to do with the query result.
Results of queries with no filters are sets.
If you want the persistent filtered results, you can achieve that with sort. Otherwise, the result can be shuffled - I am not sure why - it can be caused by PostgreSQL or Perl (if the version is at least 5.18)
Thanks.
If you want the persistent filtered results, you can achieve that with sort.
How? I know I can sort the output filters (>> $A.id sort by $1
), but how do I sort the results of a query without any filters?
And shouldn't such stable sorting be inserted in all queries automatically to prevent the non-determinism of PostgreSQL or Perl?
If the culprit is Perl, couldn't you set PERL_HASH_SEED and PERL_PERTURB_KEYS?
Queries without output filters are not possible to sort. The result is a set and internally SQL uses LIMIT ...
feature, so if you ask two times for a query result you can probably get two different subsets if PostgreSQL is really non-deterministic (I don't know, but I believe it is not - at least for the same installation of PostgreSQL).
If the culprit is Perl, couldn't you set PERL_HASH_SEED and PERL_PERTURB_KEYS?
It probably can help.
But basically when everybody uses persistent links one should be aware of what is linking: (treebank, query) Or more precisely: treebank and prefilled string in query field see: http://hdl.handle.net/11346/PMLTQ-AU61
I understand that permanent links specify just the treebank and query string. But when users see "permanent" they have some expectations. It is even trickier because sometimes the query returns the results in the same order (when trying to run the same query in the same browser with the same limit within a short time after the previous execution), so the users may think that also the result is permanent. For example, I have spent some time describing PDT-C errors such as "the sixth sentence found by this query is...".
So there are two possible solutions:
Queries without output filters are not possible to sort.
I am no PostgreSQL expert, so maybe there is an easier way to prevent the non-determinism, but what about adding ORDER BY
to each PostgreSQL query? And using e.g. IDs of all nodes mentioned in the query? This should make the order deterministic.
If there is currently no guarantee about the ordering (and no way to specify the order within the query), no users should be disappointed by fixing the ordering.
Some users may appreciate random shuffling of the query results, but it should be optional and replicable (i.e. stable sort), such as in KonText.
I don't know how it all works now, but adding ORDER BY means you first need to get the whole result before you can use LIMIT and OFFSET. That would terribly slow down the service.
OK, so adding ORDER BY by default to all queries is not a good idea. But unless there is another way to prevent the non-determinism (which seems to be caused by PostgreSQL rather than Perl if a different set of 100 results is returned each time with LIMIT 100), it would be nice to at least have an option to turn on the ordering. When I know there are less than 100 (or 1000) results, the ordering would cause no significant slowdown (if I want all of them in the permanent query result anyway).
So maybe this is not so much about the "permanent link" button, but rather making it clear in the docs that the order of results is not guaranteed due to the database / technology in general?
The most I see we could do is make that information easy to reach.
Yes, it is not so much about the permanent link button - it would be nice to have deterministic order for the same query even if not using the button. But the as I wrote the users may think that also the result is permanent when they see "permanent link". So if we cannot fix the non-determinism bug, we should warn the users whenever they use the permanent link button.
“Permanent / persistent query link”?
9. 6. 2021 v 14:06, Martin Popel @.***>:
Yes, it is not so much about the permanent link button - it would be nice to have deterministic order for the same query even if not using the button. But the as I wrote the users may think that also the result is permanent when they see "permanent link". So if we cannot fix the non-determinism bug, we should warn the users whenever they use the permanent link button.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
Changing the name is not enough. My suggestion is to add the following notice (or similar) somewhere near the "Press ctrl + c to copy": "The permanent link encodes just the query and treebank. Ordering of the returned results may change."
You are still mixing 2 things and I am not convinced it is a good idea. The change in working makes it clear that the link is to the query, not the results. Another issue is how pmltq works. A side-effect of it is the impermanent ordering of results. I am not convinced we need to make a strong warning about that in every place that has to do with queries. After all, it seems we get this question about once in a decade.
Thus I propose to do 2 things:
Pavel
9. 6. 2021 v 15:01, Martin Popel @.***>:
Changing the name is not enough. My suggestion is to add the following notice (or similar) somewhere near the "Press ctrl + c to copy": "The permanent link encodes just the query and treebank. Ordering of the returned results may change."
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
https://lindat.mff.cuni.cz/services/pmltq has a "Permanent link" button which suggests the result of a given query will be always the same. Unfortunately, the results are sometimes returned in a different order. If random shuffling of the order is a feature, can we turn it off for permanent links?