When we introduced the No Value (null) option in facets, we also used some heuristics that will make sure the "No value" is only offered in the cases that makes sense.
In this issue, our goal is to review the logic and see whether we should make any changes to it. For reference, #728 is the original issue where we discussed the addition of this feature and in #888 we made some modifications to it.
In faceting a No value filter could mean any of the following:
Scalar value being null. In terms of ermrest, this will be a simple col::null:: query.
No value exists in the given path (checking presence of a value in the path). The query for getting this could be complicated. We have to use outer join in the path. Because of the implementation limitation we cannot have more than two of these path existence checks (we cannot have two right outer joins on different tables).
Since we're not going to show two different options for these two, we have to make sure to offer No value option when only one of these two meanings would make sense. Based on this, we can categorize facets into these groups:
(G1) Facets without any hops where column is nullable.
(G2) Facets without any hops where column is not-null.
(G3) All outbound foreign key facets that all the columns involved are not-null.
(G4) Facets with hops where column is nullable. In this case No value could mean both scalar value null and path existence check. Therefore we cannot offer No value for them.
(G5) For other cases No Value can only mean the path existence check and therefore we can offer that option to the users. Since this is path existence check, we have to use the outer join and therefore multiple of them cannot co-exist.
Based on this, the following is the logic to offer or not offer the "No value" option (first applicable rule):
already has null filter in the blob: offer.
has "hide_null_choice": true: don't offer.
G1 (nullable local column): offer
G2 (not-null local column): don't offer
G3 (not-null all-outbound): don't offer
G4 (one or more hop, nullable): don't offer.
If other facets that require outer join have null filter in the blob: don't offer.
Otherwise (G5): offer.
I should mention that apart from checking the model, we used to also check user ACLs and whether use is looking at snapshot or not, because technically ERMrest could return null value for a not-null column. But we realize that even though ERMrest technically can return null in these two cases, this null value is not the general intended use case for this filter. And therefore we should remove the check for these two cases.
To provide more information:
Regarding the first case, if all the columns in an all-outbound path are "nullok": false, if a user doesn't have "select": true to the table or columns, then ERMrest will return null for the rows that user cannot select. By using this filter, users are doing a heavy query (since we're using right outer join) and the returned value is more of a policy check instead of value check (the result is rows that users don't have access to the column value not the rows that have null value.)
And for the second case, there's a special API in ERMrest that allows DBAs to purge the historical values of columns even if the column is "nullok": false. So the null filter in this case is like a magical tool that gives user the ability to find such cases (and that's not the default intended use case for this filter.)
When we introduced the
No Value
(null) option in facets, we also used some heuristics that will make sure the "No value" is only offered in the cases that makes sense.In this issue, our goal is to review the logic and see whether we should make any changes to it. For reference, #728 is the original issue where we discussed the addition of this feature and in #888 we made some modifications to it.
In faceting a
No value
filter could mean any of the following:null
. In terms of ermrest, this will be a simplecol::null::
query.Since we're not going to show two different options for these two, we have to make sure to offer
No value
option when only one of these two meanings would make sense. Based on this, we can categorize facets into these groups:(G1) Facets without any hops where column is nullable.
(G2) Facets without any hops where column is not-null.
(G3) All outbound foreign key facets that all the columns involved are not-null.
(G4) Facets with hops where column is nullable. In this case
No value
could mean both scalar valuenull
and path existence check. Therefore we cannot offerNo value
for them.(G5) For other cases
No Value
can only mean the path existence check and therefore we can offer that option to the users. Since this is path existence check, we have to use the outer join and therefore multiple of them cannot co-exist.Based on this, the following is the logic to offer or not offer the "No value" option (first applicable rule):
null
filter in the blob: offer."hide_null_choice": true
: don't offer.null
filter in the blob: don't offer.I should mention that apart from checking the model, we used to also check user ACLs and whether use is looking at snapshot or not, because technically ERMrest could return
null
value for a not-null column. But we realize that even though ERMrest technically can returnnull
in these two cases, thisnull
value is not the general intended use case for this filter. And therefore we should remove the check for these two cases.To provide more information:
Regarding the first case, if all the columns in an all-outbound path are
"nullok": false
, if a user doesn't have"select": true
to the table or columns, then ERMrest will returnnull
for the rows that user cannotselect
. By using this filter, users are doing a heavy query (since we're using right outer join) and the returned value is more of a policy check instead of value check (the result is rows that users don't have access to the column value not the rows that have null value.)And for the second case, there's a special API in ERMrest that allows DBAs to purge the historical values of columns even if the column is
"nullok": false
. So thenull
filter in this case is like a magical tool that gives user the ability to find such cases (and that's not the default intended use case for this filter.)