oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.33k stars 746 forks source link

Opengrok treats C++ STL header file as plain text #3507

Open TooYoungTooSimp opened 3 years ago

TooYoungTooSimp commented 3 years ago

I'm using opengrok to index some stl implements for learning. But I find opengrok treats all stl headers as plain text (algorithm,set,map and etc). So I'm wondering if there's a way to let opengrok treats no suffix text file as cpp header file? Thanks for your help.

vladak commented 3 years ago

Depends on the suffix of these files: https://github.com/oracle/opengrok/blob/6364b315f29d027624086909d5c04965f0abf281/opengrok-indexer/src/main/java/org/opengrok/indexer/analysis/c/CxxAnalyzerFactory.java#L32-L42

vladak commented 3 years ago

If the file in question has one of these suffixes, attach reproducible test case please.

TooYoungTooSimp commented 3 years ago

Unfortunately, C++ STL headers have no suffix. https://github.com/microsoft/STL/tree/main/stl/inc https://github.com/llvm/llvm-project/tree/main/libcxx/include As we can see, stl headers (like algorithm, vector, set, map, queue) are all have no suffix. That's why I'm wondering how to index them.

vladak commented 3 years ago

I don't think there is a way currently how to make this work. Either the -A specification needs to be extended to paths or the matching needs to be much more flexible.

TooYoungTooSimp commented 3 years ago

I found the 'file' utility can successfully identify stl header as c source IMG_20210402_092348.jpg Is it possible to integrate this to opengrok? Or directly use libmagic instead?

vladak commented 3 years ago

straceing the file utility reveals that it reads the file whole (it reads 1 MiB of the file in fact, at least on my system). We definitely want to avoid that - the classification of a file should be fast. Maybe the AnalyzerGuru could perform some quick analysis based on some predefined chunk of the file however that sounds like an overhaul.

TooYoungTooSimp commented 3 years ago

Maybe we can use the existing file type analyzer first. When a file has no file extension, then use file to distinguish its type. (So use file just as a addition)

Is that possible?

TooYoungTooSimp commented 3 years ago

and when use libmagic directly, there's no need to read full file, providing a buffer is okay.

https://linux.die.net/man/3/libmagic