universal-ctags / ctags

A maintained ctags implementation
https://ctags.io
GNU General Public License v2.0
6.46k stars 620 forks source link

utf-8 eol support ? #790

Open tarzanek opened 8 years ago

tarzanek commented 8 years ago

This is more like a question - does universal ctag support utf-8 eols ?

We have some clash on jflex Vs ctags https://github.com/OpenGrok/OpenGrok/blob/master/testdata/sources/c/bug15890.c#L4

and I was wondering if ctags caught up here eventually

tia for any answer L

b4n commented 8 years ago

AFAIK it doesn't, and only \n and \r are recognized.

Using VT (0B) and FF (0C) as line separator is questionable IMO. The others require knowing the encoding of the input file, which is terribly tricky (yes, UTF-8 is easy-ish to recognize, but it's basically the only one).

Also, how relevant is it to support it? Especially in C: neither GCC nor CLang do support them, and they treat it at best like whitespaces (CLang, which emits a warning by default), and at worse as "stray" bytes (GCC) and errors out. Same for VT and FF, neither of those compilers treat them as line separators. For other languages, I guess whether support for it should be considered depends on whether the language itself defines it as supported, or if the most prominent tools for it do recognize them as such.

In the end, in practice IMO it doesn't make much sense to support, not only because it's hard and would require special handling depending on the language, but also because those line endings are really not common in source code, and support for them in some languages is, as I see it, rather due to input files being defined as Unicode (so supporting those as their Unicode meaning is natural) rather than because those line endings are useful in real life.

tarzanek commented 8 years ago

I am concerned about java (ev. java ee) and android source code files in particular Also C# and other newer languages might benefit from this support - at least for UTF-8 if possible jflex supports it, so ctags could too, no? ;)

masatake commented 8 years ago

Is this serious issue in opengrok? Is there source code of free software using utf-8 eol?

tarzanek commented 8 years ago

not really, it's just that we have different code bases for using ctags line numbers, since because of EOL they don't have to match jflex line numbers ... so I'd hope to cleanup this messy workaround we have, since it's 2016 :)

tarzanek commented 8 years ago

regarding free code - android source code might be an example, but not 100% sure