oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.37k stars 750 forks source link

Cannot search for a file extension reliably #701

Open cnst opened 10 years ago

cnst commented 10 years ago

It would seem like it's not really possible to search for a file extension.

Remember that dot is a token, as per help.jsp, so, to search for ".0" you have to actually search for ". 0" (which, I'd argue, is quite counterintuitive by itself), but even then you get a lot of false-positive results, for example, if any directories end with ".0" (as in, 1.0.0.0), or any filenames like TestLibrary-1.0.0.0.dll.

If anything like a \.0$ regexp is supported in the newest OpenGrok with the newest Lucene (or does it become /\. 0$/?), I haven't found any proof of that within help.jsp. If it's not supported, it'd be nice to have something to the effect.

vladak commented 10 years ago

When I need to search for file extension I use the distance specifier in the Path field, e.g. ". c"~1 (note the space) to search for *.c files. It probably does not work 100% but it does the job for me.

cnst commented 10 years ago

The results I'm getting for ". 0" and ". 0"~1 seem largely identical — only the total number of hits seems to differ for some reason, but the front pages are exactly the same.

tarzanek commented 10 years ago

yeah, this is kinda bad, it can be somehow mitigated with the search by file type feature, but otherwise OpenGrok has no notion of what is extension(I'd be curious about definition of "extension" myself ;) ) hence marking as enhancement usual cases can be mitigated with what vladak mentioned, but it's not really generic, so "0" or other extensions which are part of name, etc. are really bad to find/filter on - have a look at PathTokenizer, I think we parse tokens using \ and . as separators

vladak commented 10 years ago

The number or results from ". 0"~1 should be less than ". 0", if you scroll past the results in the latter case you should see not exact matches, i.e. where the terms . and 0 have distance strictly greater than 1.