Open mahadzaryab1 opened 2 weeks ago
@yurishkuro we've got 3 callsites for InitArchiveStorage
that currently does the runtime cast. I had a couple of questions on how to move forward here and wanted to get your thoughts.
I don't think you need to change (1), but we need to change InitArchiveStorage() not to cast but to use the storage directly
(2) yes
(3) I think you don't need to change it because if the caller needs an archive storage it should instantiate a different remote storage. As I understand it there's not a dedicated gRPC API for archive storage, which could go away the same way as ArchiveFactory.
@yurishkuro Got it. For (2) - this is how the primary storage factory is initialized. How would we go about initializing the archive storage factory here to expose the CLI flags for storages that have archive flags (cassandra/es)
@yurishkuro For v1, what do you think of passing the isArchive
flag into https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/factory.go#L116. This way, we can create a new archive storage using NewFactory, which we can pass down to es.NewFactory()
and any other storage configs that need it.
Attention: Patch coverage is 98.75000%
with 1 line
in your changes missing coverage. Please review.
Project coverage is 96.44%. Comparing base (
0a24f6d
) to head (4194aab
). Report is 5 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
plugin/storage/es/factory.go | 97.56% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@yurishkuro For ES Archive, it looks like the archive flag is used in two places:
getSourceFn
to add sorting and a search after clause if we're not querying the archive index (https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/es/spanstore/reader.go#L211-L214)For Cassandra, its a bit more straightforward:
Do you have any thoughts on how we should proceed? Do we still want to expose an isArchive
or setAsArchive
flag?
The metrics namespacing for Cassandra can be easily done elsewhere, does not need to be based on isArchive. It's only needed there now because primary/archive distinction is made internally in the factory.
For ES:
The metrics namespacing for Cassandra can be easily done elsewhere, does not need to be based on isArchive. It's only needed there now because primary/archive distinction is made internally in the factory.
With the setup of v2, how would we make that distinction?
For ES:
- "To choose the suffix for the index name" - not needed since the user can do that themselves. One significant change in v2 is that we cannot provide different defaults in the config for primary/archive, and having the same index prefix by mistake will be bad for the user. Maybe we can introduce additional validation for configs of the same type and catch that as a configuration error.
@yurishkuro Okay I see. But if the configurations are being held in different factories - how would we perform validation there?
- "to add sorting and a search after clause if we're not querying the archive index" - I don't understand the purpose of that difference. Any ideas? Would it hurt if the logic for archive was the same as for primary?
I was thinking the same as well. I'm guessing its an optimization to avoid sorting the archive storage which would be larger than the a non-archive storage. Here is the documentation for Search After. Would we have a performance degradation here if we enabled this for archive as well?
- There is a 3rd, most important usage of isArchive - in the GetIndicesFn. That's the one where I wonder if we could replace isArchive with a different logic based on the lookback parameter.
Ah yes. I was only looking at the reader. Are you referring to getSpanAndServiceIndexFn
? This looks to once again be creating a different suffix based on whether the storage is archive or not (https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/es/spanstore/writer.go#L104-L111). Can we not use the same approach here as the first point for the ES reader?
With the setup of v2, how would we make that distinction?
the storage extension manages factories and knows storage names, it can use those names to bound MetricsFactory to have a specific label.
Okay I see. But if the configurations are being held in different factories - how would we perform validation there?
No, configuration are passed to factories, but held in a single place in storage extension, which can invoke additional validation here: https://github.com/jaegertracing/jaeger/blob/a420fd9174d0062a165072d1487e41ffe7e349f3/cmd/jaeger/internal/extension/jaegerstorage/extension.go#L118
Would we have a performance degradation here if we enabled this for archive as well?
but the thing is, we never search for traces in archive storage, we only retrieve trace by ID, so sorting in this case would only apply to the spans within a trace - yes, could have overhead for very large trace, but still small.
Are you referring to getSpanAndServiceIndexFn? This looks to once again be creating a different suffix
not just suffix, when it's primary storage with manually rotated indices they also have the date pattern in the name, but archive index never has that (because it doesn't grow large). One compromise we could do is recommend that users don't use archive storage with manually rotated indices, only with ILM. Unless there's another way that I am not seeing.
Btw, reader also has similar branching in index naming logic: https://github.com/jaegertracing/jaeger/blob/a420fd9174d0062a165072d1487e41ffe7e349f3/plugin/storage/es/spanstore/reader.go#L174
With the setup of v2, how would we make that distinction?
the storage extension manages factories and knows storage names, it can use those names to bound MetricsFactory to have a specific label.
So this is where the Cassandra storage is initialized in the extension. Are you suggesting we can pass the label into the constructor here? If so, how would we make the distinction based on just the name?
Okay I see. But if the configurations are being held in different factories - how would we perform validation there?
No, configuration are passed to factories, but held in a single place in storage extension, which can invoke additional validation here:
Oh okay, I see. So whenever we're processing an ES config, we would go through all the other ones that exist and make sure that the index prefixes are not the same?
Would we have a performance degradation here if we enabled this for archive as well?
but the thing is, we never search for traces in archive storage, we only retrieve trace by ID, so sorting in this case would only apply to the spans within a trace - yes, could have overhead for very large trace, but still small.
Sounds good. I can remove the indirection here then.
Are you referring to getSpanAndServiceIndexFn? This looks to once again be creating a different suffix
not just suffix, when it's primary storage with manually rotated indices they also have the date pattern in the name, but archive index never has that (because it doesn't grow large). One compromise we could do is recommend that users don't use archive storage with manually rotated indices, only with ILM. Unless there's another way that I am not seeing.
Btw, reader also has similar branching in index naming logic:
How would making that recommendation simplify the archive branching for us?
@yurishkuro Regarding getSourceFn
, the es archive integration tests seem to fail when the SearchAfter
clause is added as it is unable to find the trace.
Directionally this LGTM. Any blockers?
Directionally this LGTM. Any blockers?
@yurishkuro The main blocker is that currently this implementation doesn't initialize the CLI flags for archive storage. We initialize the storage factory in https://github.com/jaegertracing/jaeger/blob/main/cmd/all-in-one/main.go#L58 which in this PR will only initialize the primary storages because the storage factories only hold one configuration. Any thoughts on how we should initialize the archive CLI flags?
v1 main
s need to create two factories, primary and archive, and use both when registering CLI flags. It's worth having a helper function for that in storage/, because you would call this from all-in-one, query, and remote-storage. The helper also may need to be smart about which storages today support archive mode and which do not, and return nil/noop factory for archive for the latter
Which problem is this PR solving?
Description of the changes
How was this change tested?
Checklist
jaeger
:make lint test
jaeger-ui
:yarn lint
andyarn test