Closed mmm444 closed 6 years ago
I think the issue can be solved by changing ==
to <=
on lines https://github.com/google/zoekt/blob/master/indexbuilder.go#L94 and https://github.com/google/zoekt/blob/master/indexbuilder.go#L118 . But I am not sure if this is the right way and place to deal with it.
The document sections are generated by Zoekt too, as CTags doesn't actually output byte ranges, IIRC. Can you check how this invalid input actually got generated?
I'm a bit torn whether this should generate an error rather than silently swallowing/modifying the given section .
Uh oh, sorry about the confusion. I was expecting something that is not true.
The way to trick ctags into spliting the rune is simple:
# cat test.cs
aá:{0}
# hexdump -C test.cs
00000000 61 c3 a1 3a 7b 30 7d 0a |a..:{0}.|
00000008
# ctags --version
Universal Ctags 0.0.0(7918d19f), Copyright (C) 2015 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
Compiled: Feb 12 2018, 11:44:40
URL: https://ctags.io/
Optional compiled features: +wildcards, +regex, +multibyte, +option-directory, +xpath
# ctags -f- test.cs
aà test.cs /^aá:{0}$/;" p
# ctags -f- test.cs | hexdump -C
00000000 61 c3 09 74 65 73 74 2e 63 73 09 2f 5e 61 c3 a1 |a..test.cs./^a..|
00000010 3a 7b 30 7d 24 2f 3b 22 09 70 0a |:{0}$/;".p.|
0000001b
In the hexdump you can see that there are only 2 bytes before the tab 0x09. So the reasons why this happens are basically a confusing file extension and some non-ASCII UTF-8 inside that resembles a programming language.
The more I think about it I am also not sure whether this should be fixed on the Zoekt side. But where? Is this a bug of ctags encoding handling then?
Which version of ctags do you use? I think universal-ctag --json should handle this correctly, no?
oh sorry, Ctags 0.0.0(7918d19f) - I missed the blurb.
I'll have a look.
I was using HEAD version universal-ctags but I didn't rename it to universal-ctags. I instead changed the name in the zoekt source code. And that turned out to be a big mistake.
When I run the universal-ctags with JSON output it fails in a strange way:
# universal-ctags --output-format=json -f- test.cs
(null)
But this output is actually handled in zoekt. So in the end everything works.
Do you think it is worth reporting the (null)
output issue to universal-ctags?
Feel free to close this issue. Thank you for you time!
yes, please do.
There is still something to do here, though, but I'll take care of it.
Sometimes ctags (both exuberant and universal) return symbol boundaries that cross rune boundaries. This leads to a panic during indexing.
I have reduced the issue that happened during indexing my codebase to this test:
which panics:
This happened on latest HEAD which was at 3f9ebd3f2087991625ed8666bd805df6787a341b at time of writing this issue report.