oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.36k stars 748 forks source link

Hidden files not getting indexed. #3058

Open mavrickdin opened 4 years ago

mavrickdin commented 4 years ago

Content of hidden folder not getting indexed. After re-index, hidden folder are empty.

Files in my project are symlink to the actual files in hidden folder -

Example: src contains -

code1.pm --> .project/code1.pm code2.pm --> .project/code1.pm code3.pm --> .project/code1.pm .project

As .project is not getting indexed and not able to browse / search in files.

Error - Error: File not found!

vladak commented 4 years ago

The symlinked files are just tangential. The main problem is that it seems any directory underneath source root starting with dot will be ignored.

This is because of this check: https://github.com/oracle/opengrok/blob/d7648fcc198f443679eeca5076358753b7641a5a/opengrok-indexer/src/main/java/org/opengrok/indexer/index/Indexer.java#L949

so this seems to be intentional.

vladak commented 4 years ago

As for why, this was introduced right with the support for projects in changeset 9ec7787531611654e8f50932473aa48963eaba55. I guess it should be possible to remove this constraint - anyone can set ignored files/directories using the usual options.

vladak commented 4 years ago

This has also UI component to it - the "hidden" directories are not displayed in the UI (at least on the /source/xref/ listing).

vladak commented 4 years ago

The UI aspect of this issue is that this code filters out directories that do not correspond to a project: https://github.com/oracle/opengrok/blob/d7648fcc198f443679eeca5076358753b7641a5a/opengrok-indexer/src/main/java/org/opengrok/indexer/web/PageConfig.java#L473-L479

idodeclare commented 4 years ago

@vladak , I'm not sure this is correctly diagnosed.

I'm not able to reproduce the problem and in fact I find the example .project directory is indexed both when -P,--projects is specified or when omitting -P,--projects. (The example structure is project-less, so I would recommend against running with -P,--projects; and in fact the suggester only works for this structure when omitting -P,--projects.)

I also find that list.jsp shows the .project directory regardless of a -P,--projects setting.

vladak commented 4 years ago

The !name.startsWith(".") code in Indexer has to have some effect. This is what I did (Ubuntu):

$ curl -s -L -O https://github.com/oracle/opengrok/releases/download/1.3.8/opengrok-1.3.8.tar.gz
$ tar xfz opengrok-1.3.8.tar.gz
$ cd opengrok-1.3.8
$ mkdir -p /tmp/opengrok/{src,data}
$ mkdir /tmp/opengrok/src/{foo,.bar}
$ date > /tmp/opengrok/src/foo/date.txt
$ date > /tmp/opengrok/src/.bar/date.txt
$ java -jar lib/opengrok.jar -P -s /tmp/opengrok/src -d /tmp/opengrok/data -W /tmp/opengrok/config.xml
...
$ grep -c Project /tmp/opengrok/config.xml 
1
$ grep -A 5 Project /tmp/opengrok/config.xml 
    <object class="org.opengrok.indexer.configuration.Project">
     <void property="indexed">
      <boolean>true</boolean>
     </void>
     <void property="name">
      <string>foo</string>

Only the foo project is detected.

idodeclare commented 4 years ago

@vladak , @mavrickdin is not reporting a project-detection problem but that "content of a hidden folder is not getting indexed." For the example structure, I do not think the .project directory aligns with OpenGrok's definition of a project; the structure seems project-less — i.e. not to be run with -P.

Upon following this example (and not your different example of foo and .bar) and running Indexer without -P, the files are indexed and searchable as shown below. (If run with -P, the files are still indexed and searchable, but for this structure the Suggester only produces results for not -P.)

Successful search

image

Directory listing of source root

image
vladak commented 4 years ago

Fair enough. Still, ignoring directories starting with . for project detection seems wrong to me.