Closed Heraclitus closed 4 years ago
@Heraclitus nice detective work on this one! I'm not quite sure about the fix, as I'd have to dig deeper into the code to better understand it, but perhaps @burmanm still remembers about this and would be interested in taking a look?
I can take a look. It shouldn't go to the merge if there's a single index request (there's no results to merge).
@Heraclitus One thing that I didn't ask is that are you sure you didn't filter by serviceName? Because:
3. Inspecting the arrays showed that one of the arrays contained my trace.
If your trace did not match, it should have been in the both arrays. If it's only in the one array, that means one search query never matched your trace. Sadly I can't tell from the screenshots since those parts are cut.
@burmanm on your question about serviceName. The "Find Traces" button is grey'd out unless the service drop down is populated and selecting a service is required.
@burmanm here is more of the screenshots, three examples, first without tag search, I am required to select a service to show anything and the second with tag search populated. Third, with service name removed and tags, but "Find Traces" is grey'd out so I can't search. This is still running on my modified version for 1.14.0.
@Heraclitus The service name is the only required field for the search. Based on the screenshots above, I don't see an evidence of a problem: your second screenshot shows that the filter did work as expected, from what I understand.
Isn't this test pretty close to what's trying to be achieved here? https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/badger/spanstore/read_write_test.go#L669
If you can replicate the issue by modifying that test, then that would be very helpful. Obviously that wouldn't catch if there's a mistake from some refactoring between UI <-> badger-storage-plugin, but at least it would help narrow the search scope.
@jpkrohling those screenshots were taken from my altered version so the search works only because of the changes I made.
@burmanm I'll give that a try today
@burmanm my day job interfered yesterday but what I'm first seeing is that my tag level 1.14.0 of that file doesn't have that test, func TestRandomTraceID(t *testing.T) {
Since I'm on 1.14.0 I see https://github.com/jaegertracing/jaeger/blob/v1.14.0/plugin/storage/badger/spanstore/read_write_test.go
However, I'm going to try porting that test from master -> v1.14.0.
Now this could be a fools errand if the api changed. I'll see if I can manage. Maybe this is something got fixed in a later version past 1.14.0. Will do some investigation.
And, my concern materialized quickly... for file plugin/storage/badger/spanstore/writer.go
master func (w *SpanWriter) WriteSpan(ctx context.Context, span *model.Span) error {
v1.14.0 func (w *SpanWriter) WriteSpan(span *model.Span) error {
https://github.com/jaegertracing/jaeger/blob/v1.14.0/plugin/storage/badger/spanstore/writer.go#L71
... hrm ... I guess what I'll try next is moving to a new version of jeager and see if I can reproduce the error. May take some time.
Turned out; not so hard after all. Stash, then sync to release branch release-1.17
and get sub modules in alignment.
the test in question does in fact exist on that branch and I was able to add the following code
Tags: model.KeyValues{
model.KeyValue{
Key: "http.request_id",
VStr: "second",
VType: model.StringType,
},
// made up tags for this bug
model.KeyValue{Key: "guid:x-request-id", VStr: genId },
model.KeyValue{Key: "xrequestid", VStr: genId },
model.KeyValue{Key: "request", VStr: string(runes[1:7])},
},
then to the assertion portion of the test
// Try guid search
params = &spanstore.TraceQueryParameters{
StartTimeMin: time.Now().Add(-2 * time.Minute),
StartTimeMax: time.Now(),
ServiceName: "nginx",
Tags: map[string]string{
"guid:x-request-id": genId,
},
}
traces, err = sr.FindTraces(context.Background(), params)
assert.NoError(t, err)
assert.Equal(t, 1, len(traces))
and the result is that the unit tests passes.
The next test was to see if a local running instance from release branch v.1-17 of all_in_one with the obligatory tags added like I did for 1.14.0
func (w *SpanWriter) WriteSpan(span *model.Span) error {
genId,_ := uuid.GenerateUUID()
runes := []rune(genId)
span.Tags = append(span.Tags,
model.KeyValue{Key: "guid:x-request-id", VStr: genId },
model.KeyValue{Key: "xrequestid", VStr: genId },
model.KeyValue{Key: "request", VStr: string(runes[1:7]) },
)
that ensured that any span write gets the offending tags written.
Then use the local UI to browse to find a generated guid
then a search for that guid
and confirmation that the trace found matches
I'll do some more testing to verify i'm not false-positiving this, but initially it looks like whatever bug existed was fixed after 1.14.0
k, I stashed again, checked out tag v1.15.0, submodule aligned and was able to use the existing badger state files to see previously populated traces both with and without my obligatory tags and I can confirm that v1.15.0 works without any changes to the source.
redacted@redacted-lu:/depot/github/src/github.com/jaegertracing/jaeger$ git status
HEAD detached at v1.15.0
Untracked files:
(use "git add <file>..." to include in what will be committed)
cmd/REDACTED... ... a bunch of files I created to help me think about this problem ...
nothing added to commit but untracked files present (use "git add" to track)
Yep, I'm good. This was probably fixed in 1.15.0 here is my test loop in essence...
git checkout v1.14.0
git submodule update --init --recursive
git status
# stop all_in_one, restart all_in_one
# force browser refresh at root path
# search for guid:x-request-id=86cddff7-2a6a-b17b-5d25-eeb1ff720499
# NO RESULTS FOUND
git checkout v1.15.0
git submodule update --init --recursive
git status
# stop all_in_one, restart all_in_one
# force browser refresh at root path
# search for guid:x-request-id=86cddff7-2a6a-b17b-5d25-eeb1ff720499
# RESULT FOUND
Good to hear, thanks for the investigation.
VERSION: v1.14.0 EXPECTATION: single trace response in UI for tag search
guid:x-request-id=628d36f4-9072-c587-3b00-804c10a63000
ACTUAL:No trace results. Try another query.
SETUP:plugin/storage/badger/spanstore/writer.go
to add tags to any span write that occurred. I added the code below to methodfunc (w *SpanWriter) WriteSpan(span *model.Span) error
insert at line 73 (with one import for uuid) based on commitcommit fb5505005a21f007792dedbcd2ad49484d1d587e (HEAD, tag: v1.14.0)
NOTE in some rare cases this change alone was enough to allow search to work, however it was highly inconsistent
xrequestid
andrequest
were just experimentation, not essential.github.com/jaegertracing/jaeger/cmd/all-in-one
with the following env vars to activate badgercmd/all-in-one/all_in_one_test.go
methodfunc TestAllInOne(t *testing.T)
(as is; without modification) in order to prime the jaeger instance with data.CODE CHANGE THAT RETURNS THE TRACE
mergeJoinIds(left, right [][]byte) [][]byte {
from fileplugin/storage/badger/spanstore/reader.go
was an empty array despite both arraysleft
andright
having content.case 0:
and as a result the
merged
array was empty.plugin/storage/badger/spanstore/reader.go
and methodfunc sortMergeIds(query *spanstore.TraceQueryParameters, ids [][][]byte) []model.TraceID {
to line 388 Which basically just checks to see if the merged result is empty, if so check for either of the input arrays having nested array length of 1 and append that entry to the merged result.
guid:x-request-id=...
value I want and I get back the correct trace.SPECULATION Since I don't understand how those two id arrays are populated nor do I understand why one of them doesn't contain the entry i'm interested in I can't say what the actual fix is, but my guess is that it was expected that the second array would have contained the entry that was in the 1st array and the merge would have collapsed it down and returned a valid entry. Maybe something about the search is failing?
git diff below