oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.33k stars 746 forks source link

smarter detection of RCS repository #1458

Open TotoXe opened 7 years ago

TotoXe commented 7 years ago

Hi,

The links History and Annotate are no more available on our OpenGrok service :-(

OpenGrok: 0.12.1.6, Git 1.9.1, Java 1.8_121, Tomcat 7.0.75 When indexing, config used is: export IGNORE_PATTERNS="-i .bin -i .gz -i .rpt -i .tar -i .zip -i .class -i .bz2 -i .7z -i .tgz -i .exe -i .jar -i .war" export JAVA_OPTS="$JAVA_OPTS -Xmx12g -d64 -server" export OPENGROK_FLUSH_RAM_BUFFER_SIZE="-m 256" export OPENGROK_INSTANCE_BASE=$OPENGROK_VAR export OPENGROK_SCAN_DEPTH=2 export OPENGROK_VERBOSE=true export OPENGROK_WEBAPP_CONTEXT=grok export OPENGROK_VAR=/srv/opengrok-var (which is linked from /var/opengrok)

Sources are into /srv/opengrok-src/git. Link src -> /srv/opengrok-src/git exists in $OPENGROK_VAR.

In the confiuration.xml, I can see this part: <void property="repositories"> <void method="add"> <object class="org.opensolaris.opengrok.history.RepositoryInfo"> <void property="directoryName"> <string>/srv/opengrok-src/git</string> </void> <void property="type"> <string>RCS</string> </void> </object> </void> </void>

Why this RCS value wheras all my repos are Git repos ?

It is linked to what I can see in the log: Loading the default instance configuration ... 02:54:21 FINE: Installing default uncaught exception handler Logging filehandler pattern: /srv/opengrok-var/log/opengrok%g.%u.log 02:54:22 INFO: Scanning for repositories... 02:54:22 INFO: done invalidating repositories (took 0) 02:54:22 INFO: Done scanning for repositories (0s) 02:54:22 INFO: Writing configuration to /srv/opengrok-var/etc/configuration.xml 02:54:22 INFO: Done... 02:54:22 INFO: Generating history cache for all repositories ... 02:54:22 INFO: Creating historycache for 1 repositories 02:54:22 INFO: Creating historycache for /srv/opengrok-src/git (RCSRepository) 02:54:22 INFO: Skipping creation of history cache for /srv/opengrok-src/git, since retrieval of history for directories is not implemented for this repository type. 02:54:22 INFO: Done historycache for /srv/opengrok-src/git (took 1 ms) 02:54:22 INFO: Done historycache for all repositories (took 23 ms) 02:54:22 INFO: Done... 02:54:22 INFO: Starting indexing ...

Setup has not changed. I had not this before. I can not understand why historycache is no more updated and links not available.

Could you help me fiure out what happens ?

vladak commented 7 years ago

When this started happening ? what exactly has changed ?

Could you try the latest 0.13 RC ?

vladak commented 7 years ago

Also, do the directories with git repos have any subdirectories named RCS ?

TotoXe commented 7 years ago

Is it possible to run 2 OpenGrok versions in same Tomecat ? I think I read no. Please confirm. If no, I can prepare a new ROOT_DIR and run on same server index cmd of 13rc10 using same sources. But how can I avoid configuration.xml is sent to Tomcat via port 2424 at the end of indexing job ?

I will check tomorrow if RCS is one of the projects.

vladak commented 7 years ago

The indexer will not send the config to the webapp if OPENGROK_WEBAPP_CFGADDR is set to none in the OpenGrok script.

It might be sufficient just to copy the contents of the repo to another machine and let it run the initial indexing phase where it detects types of repos.

TotoXe commented 7 years ago

Ok. Cool. Indeed, there is a project named RCS, and 6 repos below.

vladak commented 7 years ago

Could you post the structure of the directory ?

I wonder if there is a way how to make the RCS repository detection more reliable. Normally there are files with the ,v suffix inside.

TotoXe commented 7 years ago

I cannot post these infos. My email is available if you want to contact me. I have moved RCS project outside SRC_ROOT. I have started indexer adding var OPENGROK_WEBAPP_CFGADDR =. Now, all repository history caches are updated... Will try too asap with 13rc10 letting RCS into SRC_ROOT. You caught the root cause!

vladak commented 7 years ago

I tried reproducing the problem as follows:

cd /var/opengrok/src
mkdir gittest
mkdir RCS
date > foo.txt
git init
git add foo.txt
git commit -m foo foo.txt

followed by reindex and it detects the repository as git.

Also, repositories array in https://github.com/OpenGrok/OpenGrok/blob/master/src/org/opensolaris/opengrok/history/RepositoryFactory.java has GitRepository before RCSRepository so if there is a .git subdirectory it will be detected as git because the array is traversed sequentially:

    public static Repository getRepository(File file) throws InstantiationException, IllegalAccessException {
        RuntimeEnvironment env = RuntimeEnvironment.getInstance();
        Repository res = null;
        for (Repository rep : repositories) {
             if (rep.isRepositoryFor(file)) {

This is valid also for 0.12.x.

So, is there a .git subdirectory under /srv/opengrok-src/git ?

TotoXe commented 7 years ago

No .git under /srv/opengrok-src/git: ls -ltra .git ls: cannot access .git: No such file or directory

No file ending with ,v: find RCS -name '*,v' reports nothing.

Is there something I can do to trace what OpenGrok is doing ?

vladak commented 7 years ago

I am confused - how the directory /srv/opengrok-src/git could be git repository when it's missing the .git subdirectory ?

TotoXe commented 7 years ago

The path /srv/opengrok-src/git is the root for all our projects. One of these projects is RCS. Unders RCS, there are 6 Git repositories: /srv/opengrok-src/git/RCS/repo1/.git /srv/opengrok-src/git/RCS/repo2/.git

I have moved RCS outside SRC_ROOT=/srv/opengrok-src/git, then reindexed (I got many lines regarding history cache updates for Git repositories), then restarted OpenGrok service: it is working now since I can access the history and anotate the files.

Then, I moved back RCS project to SRC_ROOT, then I installed 13rc10 and Git 2.11, then I started indexing targeting a new DATA_ROOT. Many errors while creating history cache... I will redo this later to get clean status for rc10.

So, for 0.12.1.6, the issue is coming from this RCS project.

vladak commented 7 years ago

Okay, I see now. Making RCS detection smarter will solve the problem.

TotoXe commented 7 years ago

Thank you for your quick feedback!

ChristopheBordieu commented 7 years ago

Hello,

I have still an issue with 1.0. Reindex done from scratch. When starting, all Git repos have history cache created correctly except repo RCS/repo1. No message. Then, while indexing, I have only this message: 159949 13:56:52 FINER: ignoring /srv/staging-opengrok-var/src.staging/RCS

Could you reopen the issue ?

vladak commented 7 years ago

I see, this is being ignored because of the special directory listing in IgnoredDirs. Will reopen this however I can't help to think that using RCS as directory name is quite unfortunate.

ChristopheBordieu commented 7 years ago

RCS directory name comes from our tool to manage Git repos. Projects contains repos. Project name is several words that are used to automatically create a project key (first letter of the words). In my case, this RCS resulted from the automatic process. I am sure that one day, I will have project keys GIT or SVN...

I think OpenGrok should allow these directory names. Eventually, OpenGrok could provide a way to specify (ie limit) the scm types used into SRC_ROOT so that it does not try to check if a directory is RCS or Git or hg or svn...

vladak commented 7 years ago

Well, git/svn at least use dotted dirs.

To adress this, ignoredDirs wiuld need to see if the dir is a repo metadata dir.

vladak commented 7 years ago

Possibly Repository objects could register their isRepositoryFor() methods with ignoredDirs.

idodeclare commented 4 years ago

If PR #2969 is merged, you could --disableRepository rcs (or add "RCSRepository" to <void property="disabledRepositories"> in the read-only configuration).