Closed acbentley closed 9 months ago
Based on the behavior of Sp6, turning off "Show" changes the logic of the QB field from a normal logical predicate to an existential quantifier. Further experimentation reveals that applying "not" operates on the existence quantifier and not the other way around. That is "not exists P" rather than "exists not P". It's not clear to me yet where in the join chain the quantification occurs in fields which traverse more than a single one-to-many relationship.
I believe a more principled way to achieve the desired behavior is to make use of the distinct option on queries. Arguably, all queries should return distinct tuples in a relational model. This has a side effect of disabling the linking results to the base table records. I think the best way to resolve this would be to check if the query includes a unique field from the base table, and include the links in that case.
cf. #1464
See also the following comment from @mcruz-umich:
I have developed the following process for removing "Tissues" from our publications.
It involves downloading the zip file from DataExporter, then unzipping it, modifying the csv, and then re-zipping it and deploying.
There is a bug in the "Schema Mapper" in which it checks for duplicate records BEFORE running respecting "distinct" checkbox. This bug therefore means you cannot query at the gift or preparation level and then reduce multiple rows down to a single row for the same Cat # even if you have the columns set to "do not show". The "Schema Mapper" logic needs to process the "distinct" checkbox FIRST before checking for duplicates. Since that is not working in 6.8.00.... I have developed the following process for removing gift records and stripping "Tissue - N" from the aggregated preparations field.
OPENOFFICE is used below to leverage a visual view of the csv data such that rows can be sorted and deleted easily.
- SP DATAEXPORTER: Export from DataExporter - this will include gifts and empty preparations
- OPENOFFICE: Open the csv in OpenOffice, sort by "preparations" and delete rows that have none
- SPECIFY: Run a query to get a list of the gifts
- SUBLIME TEXT: Open the csv in SublimeText and search for the gift numbers and delete those rows REGEX: ,(gift_num_1|gift_num_2|gift_num_3|...gift_num_n),
- SUBLIME TEXT: Search for "tissue" prepTypes and remove the string but keep the row. This will affect the Preparations-aggregation column and may result in a blank cell if it had only contained tissue REGEX: (tissue( - \d)?;? ?)|(; tissue( - \d)?)+
- Open in OpenOffice again and sort by Catalog Number
Originally posted in https://github.com/specify/webportal-installer/issues/65#issuecomment-1228745228
I played around with Ben's idea (https://github.com/specify/specify7/issues/1450#issuecomment-1155534940) in https://github.com/specify/specify7/commit/95d2de77f0b96bc5e2d829a981484d0f448773ee.
Specifically, https://github.com/specify/specify7/blob/95d2de77f0b96bc5e2d829a981484d0f448773ee/specifyweb/stored_queries/utils.py#L106-L116
It also checks for uniqueness rules defined in business rules. The logic is that if the set of fields from the base table is a superset of any of the uniqueness rules, then we can add recordid in distinct mode.
The implementation also considers scoping done by query builder
For example, catalognumber
and collection
form a uniqueness rule, and if you selected catalognumber
and text3
in query builder with collection object as base, and distinct checked, you can STILL get recordid (because query builder adds a collection scope). So, the above solution checks if {catalognumber, text3, collection
} (from QB) is a superset of {catalognumber, collection}
(from Uniqueness Rules).
The backend is mostly done (or at least we have a generic solution). The remaining work on backend is also to treat primary key in base table as global unique.
HOWEVER, frontend does not expect recordid when distinct, so currently it just crashes after backend adds recordid in distinct mode. We could replicate the logic of deciding whether to add recordid in distinct mode on the frontend, so it knows how many columns to expect, or could infer it from the response (via column count, very hacky) or a response field.
I'm working on a PR #4541 to expand on the solution implemented in the issue-1450
branch that is based on the solution suggested by Ben and Vinny.
Based on the behavior of Sp6, turning off "Show" changes the logic of the QB field from a normal logical predicate to an existential quantifier. Further experimentation reveals that applying "not" operates on the existence quantifier and not the other way around. That is "not exists P" rather than "exists not P". It's not clear to me yet where in the join chain the quantification occurs in fields which traverse more than a single one-to-many relationship.
I believe a more principled way to achieve the desired behavior is to make use of the distinct option on queries. Arguably, all queries should return distinct tuples in a relational model. This has a side effect of disabling the linking results to the base table records. I think the best way to resolve this would be to check if the query includes a unique field from the base table, and include the links in that case.
I'm also looking into implementing a simpler fix brought up by Max https://github.com/specify/specify7/issues/1464#issuecomment-1286985113
When running a distinct query, can we mimic a distinct query using "GROUP BY" instead of "SELECT DISTINCT" and include the id in the query as a
CONCAT(id, ", ")
(which would be parsed on the front-end into separate records)?
I'm working on both solutions, anyone have opinions on which solution they prefer? @maxpatiiuk @realVinayak @melton-jason?
It would be more faster to do Max' solution imo. The backend change is easy and you could display the resources (after parsing) using RecordSelectorFromIds (see usage in deletion blockers if you want to see isolated usage).
Will be fixed in https://github.com/specify/specify7/issues/4607
This issue is now moved into #4607
Describe the bug When running a query with the show button unchecked for certain fields, the query result still duplicates records based on that field
To Reproduce Steps to reproduce the behavior:
Expected behavior I would expect queries to respect unchecked status of fields for duplication as in Sp 6
Screenshots
Desktop (please complete the following information):
Reported By Name of your institution
Additional context Add any other context about the problem here.