Closed m-mohr closed 7 months ago
@m-mohr, no, I think you are right. I'll create a new PR that changes the q
parameter so that its value is a white-space-separated list of search terms.
I disagree. This is an API and not a Google search bar.
If the search is for multiple terms then the schema is an array of strings and the comma is the separator. If you don't want a comma, maybe use explode: true
so that we have ...&q=foo&q=bar
.
Otherwise we would be introducing or own micro-format that everyone has to parse on their own (even though splitting at spaces is not very difficult).
I could also live with commas, but it should be specified more clearly in the spec. It's bad that people need to figure this out from the OpenAPI fragment. The docs should clearly state that commas are the delimiter and spaces have no special meaning, which means that they are not splitting into multiple terms.
@m-mohr @cportele OK ... I'll leave it as it is but add clarifying text in the specification to point out that it is a comma separated list and that spaces have no special meaning.
cc @pvgenuchten @kalxas @mhogeweg
Sorry to bring this up again after missing this discussion.
I agree with @m-mohr in that ,
doesn't feel right as a separator (vs. spaces).
Doing some quick tests against Google, Yahoo, and Bing, their support of q
supports (at least):
AND
) by defaultOR
’d+
and -
for included and excluded termsSome search engine implementations also support the above behaviour (for example, Elasticsearch).
There are obviously more complex semantics, but perhaps items 1-3 should be considered for our support of q
to capture core mass market semantics into something more "familiar" for a user?
In STAC we have two conformance classes now:
@cportele makes a good point. Back-end implementation gets mixed in with the API.
The search engines don’t just split terms by space. Search for San Diego, etc.
See
Stemming, multi-lingual aspects, content promotion, etc. all affect results.
You could define a simple minimal syntax to be supported and allow anything else. In our Geoportal we allow submitting full Elastic/Opensearch queries and ideally that can be supported via OGC Records and STAC as well.
to me, q= is a special case, unlike {fieldname}=value or cql
to me, q= represents a free-text-search type of field, which allows to enter a text string to find close (fuzzy) matches
I like the suggestion of @mhogeweg to adopt a minimal syntax to define FTS queries, somewhere in between the advanced operators of google/microsoft, and the fts queries of elastic or postgres
The suggestions by the STAC team at https://github.com/cedadev/stac-freetext-search#http-get seem a good starting point, although I think I would suggest to combine /search?q=climate model
to climate AND model
, to limit results when you add terms.
Perhaps it's better to allow conformance metadata to specify the behaviour and leave q as truly free text.
2022-10-05: this was further discussed during an editing meeting. We decided to leave q=
as currently specified, for reasons of simplicity. @pvretano will add additional informative text as part of the OAB submission target.
I appreciate that as we already based our STAC extension on it. :-)
All, in order to try and balance the desire for a simple text search capability and also satisfy those that want something more, I have created PR #314 that adds some text around the q
parameter and also slightly enhances its capabilities.
The original specification of the q
parameter indicated that search terms are comma separated implying a logical OR. That is, if any of the specified search terms appears in one or more of the text fields in a record then that record can be included in the result set.
The slight modification that I made is to say that search terms can contain white spaces and this means that all the space-separated search terms must appear in one or more of the text fields in a record before that record can be included in the result set. So, consider q=ocean,climate%20%09change,desalination
. In this example, a record can be included in the result set if one or more of the text fields of that record contain the terms "ocean" OR ("climate" AND change") OR "desalination".
Please review the PR #314 and let me know if this is sataisfactory (keeping in mind that we are looking for a simple-to-implement capability) OR if I should fall back to the original, simpler, specification of the q
parameter.
On a personal note, I really don't want to add too much syntax to the q
parameter because it does not make sense to me to add yet another query language to the mix when we already have CQL for handling advanced cases. The idea with q
is to provide something simple that covers a large set of use cases. When I think about the Google search operator, for example, almost no one that I know is aware of the additional syntax that Google supports.
16-OCT-2023 SWG Meeting: The concensus in the SWG is that allowing white space in a search term is good but the interpretation should not be "'term' AND 'term'" but rather "'term'climate%20%09change
would file the term "climate change" as a combined term the record rather than what happens now which would match "change climate". The SWG believes that these changes represent 80% of how something like a Google search box is used. @pvretano will make the necessary changes to the PR and merge.
Porting over the discussion from https://github.com/opengeospatial/ogcapi-records/pull/273#discussion_r1218620027_
@m-mohr wrote:
@pvretano wrote:
@m-mohr wrote: