Closed alexashley closed 3 years ago
Another way to handle schema changes with es is to use index aliases and versioned indexes...
Start off by creating an index and an alias that points to it. The application uses the alias name and not the actually index indicies: my-index001 alias: my-index -> my-index001
Create a new index with a new schema, move documents to new index and update alias indicies: my-index001, my-index002 alias: my-index -> my-index002
This way you can update the scheme by only reindexing once and you don't need to change the application config.
Also, we have an apiVersion
for indices, so if we need to make changes, we can switch to v1beta2
instead of v1beta1
. I'm not sure what the migration process would look like (or if the onus is on grafeas-elasticsearch to do that migration), but that's at least in place to help make an index change like this slightly easier.
This PR adds support for listing occurrences with a filter against the nested
builtArtifacts
in a build occurrence.Elasticsearch mapping
Because of how Elasticsearch flattens documents, we updated the mapping on the occurrence index to treat
builtArtifacts
as anested
type. This is to allow for accurate filtering, otherwise unrelated results could be returned.However, there are some operational issues with this change. Currently, we only configure the Elasticsearch index mappings when creating a project, so an existing instance of
grafeas-elasticsearch
would not have its mapping updated. We tried to update the mapping against an existing index, but that isn't allowed:Unfortunately, it doesn't seem possible to do this in-place and perhaps not without some changes to
grafeas-elasticsearch
. I think a migration would go something like this:Potentially we could allow for this to happen with a single reindex by adding some flag or environment variable that configured a suffix for the index name. Then an operator could create the index, reindex, and redeploy
grafeas-elasticsearch
.Filtering updates
Originally we wanted to represent the operation of filtering against a nested array of objects by using the index (
_[_]_
) operator or a wildcard (likefoo.*.name == "bar"
). While working on that, we realized that the existing filters are based on using the field names as a string literal (e.g.,"resource.uri" == "foo"
); this doesn't match well with trying to use an operator against a field, as the operator is only parsed if the filter places the brackets outside of the string literal:"myArrayType"[0]
.To allow for the index to work more naturally, we updated the parsing steps in the
filtering
package, borrowing a fair bit fromcel-go
's unparser. This allowed us to handle theselect
expressions that make up a field likeresource.uri
and make use of the index operator match a user's expectations a little better. However, in the final direction we went, we no longer needed these changes; that said, the syntax should match expectations now and it shouldn't impact continued use of the string literals.Unfortunately, there isn't the wildcard syntax in cel that we thought was there and the index is otherwise not a good fit for trying to represent a nested query; in an expression like
foo['*'].id == "abc"
, the current way we do the parsing wouldn't give us enough information to construct the query, we'd only have access to the literal'*'
string.So we experimented with using the
filter
macro (e.g.,foo.filter(t, t.name == "abc")
), but the expansion of the macro would force us to handle comprehensions and the conditional operator.Finally, we settled on adding a new function,
nestedFilter
, to cel that we could intercept a little more easily. Not happy with the name, but we didn't want to clash with thefilter
macro. That said, as part of the parsing changes, we did turn off macros since we don't support pieces of the expanded representation. To enable easier parsing we didn't follow the same convention as the macro functions, which take an argument that represents the object in the array. Thatturned out to be difficult to remove when we constructed the query.
tl;dr: Breaking Elasticsearch schema change and large changes to the filtering package.