oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.34k stars 745 forks source link

how to have multiple Perforce repositories as projects #2798

Closed cross closed 5 years ago

cross commented 5 years ago

This is a question. I have a source tree, /disk/src, and under that are top-level directories for multiple (currently 12) projects from multiple source repositories. I am running opengrok indexer with -s /disk/src -d /disk/data -H -P -S -G. What the output from the console shows, and the log file, includes:

11:08:10 INFO: Scanning for repositories...
11:08:11 INFO: Done scanning for repositories, found 1 repositories (took 825 ms)
11:08:11 INFO: Generating history cache for all repositories ...
11:08:11 INFO: Creating historycache for 1 repositories
11:08:11 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
11:41:42 INFO: Done historycache for /data/opengrok/src (took 0:33:31)
11:41:42 INFO: Done historycache for all repositories (took 0:33:31)
11:41:42 INFO: Done...
11:41:42 INFO: Starting indexing
11:41:43 INFO: Waiting for the executors to finish
11:41:43 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
11:41:43 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
11:41:43 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
11:41:43 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling

It seems to be creating projects properly, when I view results in the U/I later, but. Are the log messages just misleading, or am I doing something wrong?

(side note, only some of the projects are Perforce, some are git, it seems to always say /disk/src (PerforceRepository) which is in a way the most concerning part of the incorrect log messages...)

vladak commented 5 years ago

Those identical paths do not look right to me. Also, the indexer found just one repository. If you run the indexer with -W, what is inside the XML configuration in terms of projects/repositories ?

vladak commented 5 years ago

As for repository detection, for some repository types it is quite simple, e.g. for Git it is sufficient to have .git directory: https://github.com/oracle/opengrok/blob/f23ad94568a09e115e8d48c641573dcf2ef7408a/opengrok-indexer/src/main/java/org/opengrok/indexer/history/GitRepository.java#L528-L534

That said, it muse be within the search depth which is 3 by default, see the comment: https://github.com/oracle/opengrok/blob/f23ad94568a09e115e8d48c641573dcf2ef7408a/opengrok-indexer/src/main/java/org/opengrok/indexer/configuration/Configuration.java#L230-L236

which makes me wonder what will be the output of find /disk/src/ -type d -name '.git' -maxdepth 3

vladak commented 5 years ago

The Perforce repository detection is more complicated - it executes bunch of p4 commands. Since not even Git repositories are detected I'd focus there first before investigating Perforce.

cross commented 5 years ago

find finds that file right where I'd expect it. /disk/src/projectA/.git But there are other subdirs of /data/src that are perforce repositories. But, all of the Creating history cache for /disk/src (PerforceRepository) lines only say /disk/src, not /disk/src/projectA or /disk/src/projectD. The git-based projectA is only noted in the following lines:

15:35:10 INFO: Starting traversal of directory /projectA
15:35:11 INFO: Starting indexing of directory /projectA
[...]
15:38:46 INFO: Done indexing of directory /projectA (took 0:03:35)
15:38:47 INFO: Optimizing the index for project projectA
15:38:49 INFO: Done optimizing index for project projectA (took 2.24 seconds)

Answering your earlier question about what does it put into the configuration.xml, it has all of the projects as you'd expect, but only one repository at /. Is that right, or do I need to do something else to let it know there are multiple repositories?

cross commented 5 years ago

Hmm. And, related note. I had done a full fresh index build of a subset of the projects. After that, I can see history of files within the perforce projects, but I get a "Error: File not found" trying to get history from files in the git project. http://localhost:8080/source/history/projectA/README.md for example yields the main search screen with an "Error: File not found" error at the bottom, but http://localhost:8080/source/history/projectD/README.md shows that files [perforce] history.

cross commented 5 years ago

(I don't know much about how projects and repositories work inside of opengrok. If it would help to move the projects into new base directories based on repository or VCS type, I could do that. But, I don't want that directory to end up being in the project name.)

vladak commented 5 years ago

The single repository at /. likely causes the trouble. What type is it ? Is there anything else below source root rather than directories ? (dot files, symlinks etc.)

vladak commented 5 years ago

Basically, projects are directories just below source root. Each project may have multiple repositories; each repository belongs to exactly one project. Project without repositories will be indexed, there will just be no history data. Detected repository thus means available history for (part of) the project it belongs to. Say /data/src is your source root, then /data/src/foo is a project if this is a directory. If there is a Git checkout underneath (i.e. /data/src/foo/.git exists and git log when run inside /data/src/foo returns meaningful output) then project foo has a repository. If there is a another checkout on deeper level (say /data/src/foo/bar/.git exists and bar is checkout of different Git repository) then project foo has 2 repositories.

You can try starting from scratch with empty source root, add one project (I'd recommend the Git one), reindex, see if everything looks okay, add another project etc. until you detect which project causes the trouble.

vladak commented 5 years ago

Thinking of this case, maybe the Indexer should disallow repositories that have directory path equal to source root path. That is, if projects are enabled.

cross commented 5 years ago

Okay. In my case, each project has a repository, but no project has more than one repository. And, as it happens, many of the projects share a repository, at least at the basest level. Most of the projects in my org are within the same larger perforce repository. Technically, I guess they are within one of two perforce repositories, but from the client point of view they're all the same perforce client, so.

So the question I have is, with each project being a directory under the source root, but each of those can be in different repositories. It seems, however, that it has the configuration set with one repository covering all projects, which is incorrect.

cross commented 5 years ago

Let me know if you'd like me to break it all out and start again with one project directory at a time. I can start from scratch again and add one from perforce, then one from git, then so-on, to see what happens.

vladak commented 5 years ago

I am not sure what do you mean by sharing repositories. If /data/src/foo and /data/src/bar are the the result of:

cd /data/src
git clone http://foo.com/foo foo
git clone http://foo.com/foo bar

or

cd /data/src
git clone http://foo.com/foo foo
git clone foo bar

then everything is fine. Even if bar is a symlink to foo this should work.

I think you need to find out which project has a repository that claims the source root path. This shoud not really happen.

cross commented 5 years ago

I'm not sure how to check that. In my case, I have one git at that level as you describe, and perforce client directories under that same level.

Hmm, it occurs to me that my perforce client likely is assuming its root is the source root. There are no views into the root, they're all under the root, but the perforce client "Root" is the source root. Maybe that's breaking things? If I make sub-directories under the opengrok source root for perforce repositories, will the "project" names not include that directory level? Hmm... Yuck. :-)

vladak commented 5 years ago

That seems likely. I have not seen such thing with other repository types.

cross commented 5 years ago

Okay. I've changed my perforce client to root all data at /disk/src/perforce. I'm checking out /disk/src/perforce/projectD and /disk/src/perforce/projectL, and leaving the git repository at /disk/src/projectA. I've nuked all of /disk/data/* for opengrok and nuked configuration.xml. I am starting a fresh index. I'll see what the XML configuration looks like after that.

(Note, I'm getting lots of WARNING output from perforce commands, complaining: Path '/disk/src/projectA/src/stream/tcp/*/...' is not under client's root '/disk/src/perforce'.)

But, I note that it now says:

16:27:44 INFO: Done scanning for repositories, found 2 repositories (took 0:01:28)
16:27:44 INFO: Generating history cache for all repositories ...
16:27:44 INFO: Creating historycache for 2 repositories
16:27:44 INFO: Creating historycache for /disk/src/src/projectA (GitRepository) without renamed file handling
16:27:44 INFO: Creating historycache for /disk/src/src/perforce (PerforceRepository) without renamed file handling
16:27:48 INFO: Done historycache for /disk/src/perforce (took 4.319 seconds)
16:28:03 INFO: Done historycache for /disk/src/projectA (took 18.520 seconds)
16:28:03 INFO: Done historycache for all repositories (took 18.556 seconds)
16:28:03 INFO: Done...

So, that's better!

cross commented 5 years ago

Basically, projects are directories just below source root. Each project may have multiple repositories; each repository belongs to exactly one project.

This concerns me some, with my rearranged sources. I want the multiple directories under /disk/src/perforce to be separate projects. But, you say above that projects are directories just below source root. I fear I'm going to have an issue here...

vladak commented 5 years ago

Pardon my Perforce ignorance, but does Perforce client enforce to have all the "checkouts" under common directory ?

cross commented 5 years ago

Yes, at least sort-of. You can have multiple "client" configurations. The p4 client (command) is told what "client" (config) it's to use. These "client" configs are stored in the server, and can be modified to affect what the p4/p4v software [client] programs do.

A p4 client config has a name, and a root. That root is the directory all of the "views" (in my case, which directories/projects from the server should be retrieved to the local system) which control things that are or aren't brought down from the server.

There's probably better documentation than what little I know from the one environment I've used it in.

cross commented 5 years ago

So, I think I could make multiple perforce clients for the multiple "projects" I want to download. This is technically possible, but manipulating the environment variables to run p4 in lots of different ways for lots of checked out projects would be a nightmare. And, I assume, will increase the number of Path '/foo/*/...' is not under client's root warnings I'm getting in the repository scan phase...

cross commented 5 years ago

https://www.perforce.com/perforce/doc.081/manuals/p4web/help/client.html https://www.perforce.com/perforce/doc.081/manuals/p4web/help/editclient.html https://www.perforce.com/manuals/cmdref/Content/CmdRef/p4_client.html

vladak commented 5 years ago

There has to be a better way. Maybe searching for old (closed even) OpenGrok issues might reveal something.

vladak commented 5 years ago

There are pieces of advice like this one: https://github.com/oracle/opengrok/issues/926#issuecomment-492853775

cross commented 5 years ago

Thanks. I looked through #926 but it doesn't dive down into the exact problem I think I'm now seeing. Moving the perforce project directories into a perforce-specific subdirectory of the source root seems to make things happier, but not "right". I can now look at history and annotations from the git projectA, but all of the perforce "projects" are now subsumed into the perforce project from OpenGrok's point of view. (which also has history and annotations working, but) That won't work for me.

Web U/I top level screenshot - http://imgbox.com/AqJVAmyJ

Is there any other way you can think of, given the things we've analyzed above, that I could make this work the way I wanted? Any way to define projects at a level further under the source root?

vladak commented 5 years ago

The idea of projects is deeply embedded into OpenGrok. Maybe some of the guys who made Perforce work can comment ? @gtoph @nerakhon ?

gtoph commented 5 years ago

Replied on the other thread as well, with that, here's how my projects look where each project is a different client spec. Mentioned it in other threads as well, but I have a main opengrok project directory and all I do is make a link (windows mklink /D) to the actual projects which can be located where ever on my drive.

Blacked out the project names just so I don't get in trouble. I don't know why it's listing the actual path below the project either, but that is probably something I did in my customization's. Haven't had a chance to look into it yet though and it wasn't a deal killer for the moment.

p4repos

cross commented 5 years ago

Interesting. Thank you. Though, noting you have symbolic links for perforce, and the real sources elsewhere, it makes me think that might be the better solution for me. But, that would also allow me to use just one client spec for perforce, where I had started. Are your projects actually in different perforce servers, or are you using a different client spec per project for some other reason?

(I don't have paths like that, but I'd guess that's because they're links. Just a guess though.)

gtoph commented 5 years ago

They are all on the same server, but each project is a separate branch off the mainline and hence each has it's own client spec. We have some wrapper utilities around P4 to properly get a whole project setup (it's an ordeal :)) but more or less I sync whatever branches to something like D:\builds\project1 and then make a link to whatever projects for opengrok:

cd d:\opengrok\projectlinks
mklink /D project1 d:\builds\project1

With you having once client spec for everything, I would think the linking thing could work and 'trick' opengrok into showing multiple projects.

cross commented 5 years ago

Yup. We just list all of the relevant projects and branches from our server via many Views lines in the client spec. Okay, I'll give that a shot. It would be easier to have one client spec, because then I can just do it with env variables, and not need to manage the .p4config files.

Thanks! More here after I give that a go, which will take hours or days to get things added and indexed. :-)

gtoph commented 5 years ago

Ya I have projects that take over an hour to compile, so there's a lota files and indexing can take hours as well. I have a task scheduled every night to do it when I"m sleeping :)

cross commented 5 years ago

Okay. I tried the one client spec into a perforce client root outside of the opengrok root, then symlinks into there from opengrok root. Unfortunately, that produced lots of the following when scanning for repositories at the start of indexing:

16:29:37 WARNING: Non-zero exit status 1 from command [p4, dirs, *] in directory /disk/opengroksources/projectD/src/stuff/libk32: Path '/disk/opengroksources/projectD/src/stuff/libk32/*/...' is not under client's root '/disk/perforce-sources'.

So, the root that is in the perforce client spec is still in effect, and it doesn't match the effective path opengrok is running in. I saw elsewhere that you mentioned using toRealPath() or the like, so maybe without that this trick won't work for perforce. Do you have that in your tree, @gtoph , that isn't in the main sources?

gtoph commented 5 years ago

I'm not signed up to be a contributor so I don't have any official branch or anything but this is what I did:

public static History getRevisions(File file, String rev) throws IOException {
        ArrayList<String> cmd = new ArrayList<String>();
        Path realPath = file.toPath();
        try{
            realPath = realPath.toRealPath();
        }catch (IOException e){}
        file = realPath.toFile();
        cmd.add("p4");
        cmd.add("filelog");
        cmd.add("-slti");
        cmd.add(protectPerforceFilename(file.getName()) + PerforceRepository.getRevisionCmd(rev));
        Executor executor = new Executor(cmd, file.getCanonicalFile().getParentFile());
        executor.exec();
        return parseFileLog(file, executor.getOutputReader());
    }
cross commented 5 years ago

Alright. Well, I'm going to close this for now. I have found that setting up with lots of different perforce client specs, I can get each repository checked out into a directory at the opengrok source root, and it works for me. I'd like to have a single or smaller number of perforce client specs, so I can perform fewer operations across larger sets of sources, but it seems not to work without effort like the code changes above, and I don't want to push for that now.