Closed cross closed 5 years ago
Those identical paths do not look right to me. Also, the indexer found just one repository. If you run the indexer with -W, what is inside the XML configuration in terms of projects/repositories ?
As for repository detection, for some repository types it is quite simple, e.g. for Git it is sufficient to have .git
directory: https://github.com/oracle/opengrok/blob/f23ad94568a09e115e8d48c641573dcf2ef7408a/opengrok-indexer/src/main/java/org/opengrok/indexer/history/GitRepository.java#L528-L534
That said, it muse be within the search depth which is 3 by default, see the comment: https://github.com/oracle/opengrok/blob/f23ad94568a09e115e8d48c641573dcf2ef7408a/opengrok-indexer/src/main/java/org/opengrok/indexer/configuration/Configuration.java#L230-L236
which makes me wonder what will be the output of find /disk/src/ -type d -name '.git' -maxdepth 3
The Perforce repository detection is more complicated - it executes bunch of p4
commands. Since not even Git repositories are detected I'd focus there first before investigating Perforce.
find finds that file right where I'd expect it. /disk/src/projectA/.git
But there are other subdirs of /data/src
that are perforce repositories.
But, all of the Creating history cache for /disk/src (PerforceRepository)
lines only say /disk/src
, not /disk/src/projectA
or /disk/src/projectD
. The git-based projectA is only noted in the following lines:
15:35:10 INFO: Starting traversal of directory /projectA
15:35:11 INFO: Starting indexing of directory /projectA
[...]
15:38:46 INFO: Done indexing of directory /projectA (took 0:03:35)
15:38:47 INFO: Optimizing the index for project projectA
15:38:49 INFO: Done optimizing index for project projectA (took 2.24 seconds)
Answering your earlier question about what does it put into the configuration.xml, it has all of the projects as you'd expect, but only one repository at /. Is that right, or do I need to do something else to let it know there are multiple repositories?
Hmm. And, related note. I had done a full fresh index build of a subset of the projects. After that, I can see history of files within the perforce projects, but I get a "Error: File not found" trying to get history from files in the git project. http://localhost:8080/source/history/projectA/README.md for example yields the main search screen with an "Error: File not found" error at the bottom, but http://localhost:8080/source/history/projectD/README.md shows that files [perforce] history.
(I don't know much about how projects and repositories work inside of opengrok. If it would help to move the projects into new base directories based on repository or VCS type, I could do that. But, I don't want that directory to end up being in the project name.)
The single repository at /.
likely causes the trouble. What type is it ? Is there anything else below source root rather than directories ? (dot files, symlinks etc.)
Basically, projects are directories just below source root. Each project may have multiple repositories; each repository belongs to exactly one project. Project without repositories will be indexed, there will just be no history data. Detected repository thus means available history for (part of) the project it belongs to. Say /data/src
is your source root, then /data/src/foo
is a project if this is a directory. If there is a Git checkout underneath (i.e. /data/src/foo/.git
exists and git log
when run inside /data/src/foo
returns meaningful output) then project foo
has a repository. If there is a another checkout on deeper level (say /data/src/foo/bar/.git
exists and bar
is checkout of different Git repository) then project foo
has 2 repositories.
You can try starting from scratch with empty source root, add one project (I'd recommend the Git one), reindex, see if everything looks okay, add another project etc. until you detect which project causes the trouble.
Thinking of this case, maybe the Indexer should disallow repositories that have directory path equal to source root path. That is, if projects are enabled.
Okay. In my case, each project has a repository, but no project has more than one repository. And, as it happens, many of the projects share a repository, at least at the basest level. Most of the projects in my org are within the same larger perforce repository. Technically, I guess they are within one of two perforce repositories, but from the client point of view they're all the same perforce client, so.
So the question I have is, with each project being a directory under the source root, but each of those can be in different repositories. It seems, however, that it has the configuration set with one repository covering all projects, which is incorrect.
Let me know if you'd like me to break it all out and start again with one project directory at a time. I can start from scratch again and add one from perforce, then one from git, then so-on, to see what happens.
I am not sure what do you mean by sharing repositories. If /data/src/foo
and /data/src/bar
are the the result of:
cd /data/src
git clone http://foo.com/foo foo
git clone http://foo.com/foo bar
or
cd /data/src
git clone http://foo.com/foo foo
git clone foo bar
then everything is fine. Even if bar
is a symlink to foo
this should work.
I think you need to find out which project has a repository that claims the source root path. This shoud not really happen.
I'm not sure how to check that. In my case, I have one git at that level as you describe, and perforce client directories under that same level.
Hmm, it occurs to me that my perforce client likely is assuming its root is the source root. There are no views into the root, they're all under the root, but the perforce client "Root" is the source root. Maybe that's breaking things? If I make sub-directories under the opengrok source root for perforce repositories, will the "project" names not include that directory level? Hmm... Yuck. :-)
That seems likely. I have not seen such thing with other repository types.
Okay. I've changed my perforce client to root all data at /disk/src/perforce. I'm checking out /disk/src/perforce/projectD and /disk/src/perforce/projectL, and leaving the git repository at /disk/src/projectA. I've nuked all of /disk/data/* for opengrok and nuked configuration.xml. I am starting a fresh index. I'll see what the XML configuration looks like after that.
(Note, I'm getting lots of WARNING output from perforce commands, complaining:
Path '/disk/src/projectA/src/stream/tcp/*/...' is not under client's root '/disk/src/perforce'.
)
But, I note that it now says:
16:27:44 INFO: Done scanning for repositories, found 2 repositories (took 0:01:28)
16:27:44 INFO: Generating history cache for all repositories ...
16:27:44 INFO: Creating historycache for 2 repositories
16:27:44 INFO: Creating historycache for /disk/src/src/projectA (GitRepository) without renamed file handling
16:27:44 INFO: Creating historycache for /disk/src/src/perforce (PerforceRepository) without renamed file handling
16:27:48 INFO: Done historycache for /disk/src/perforce (took 4.319 seconds)
16:28:03 INFO: Done historycache for /disk/src/projectA (took 18.520 seconds)
16:28:03 INFO: Done historycache for all repositories (took 18.556 seconds)
16:28:03 INFO: Done...
So, that's better!
Basically, projects are directories just below source root. Each project may have multiple repositories; each repository belongs to exactly one project.
This concerns me some, with my rearranged sources. I want the multiple directories under /disk/src/perforce to be separate projects. But, you say above that projects are directories just below source root. I fear I'm going to have an issue here...
Pardon my Perforce ignorance, but does Perforce client enforce to have all the "checkouts" under common directory ?
Yes, at least sort-of. You can have multiple "client" configurations. The p4 client (command) is told what "client" (config) it's to use. These "client" configs are stored in the server, and can be modified to affect what the p4/p4v software [client] programs do.
A p4 client config has a name, and a root. That root is the directory all of the "views" (in my case, which directories/projects from the server should be retrieved to the local system) which control things that are or aren't brought down from the server.
There's probably better documentation than what little I know from the one environment I've used it in.
So, I think I could make multiple perforce clients for the multiple "projects" I want to download. This is technically possible, but manipulating the environment variables to run p4 in lots of different ways for lots of checked out projects would be a nightmare. And, I assume, will increase the number of Path '/foo/*/...' is not under client's root
warnings I'm getting in the repository scan phase...
There has to be a better way. Maybe searching for old (closed even) OpenGrok issues might reveal something.
There are pieces of advice like this one: https://github.com/oracle/opengrok/issues/926#issuecomment-492853775
Thanks. I looked through #926 but it doesn't dive down into the exact problem I think I'm now seeing. Moving the perforce project directories into a perforce-specific subdirectory of the source root seems to make things happier, but not "right". I can now look at history and annotations from the git projectA, but all of the perforce "projects" are now subsumed into the perforce project from OpenGrok's point of view. (which also has history and annotations working, but) That won't work for me.
Web U/I top level screenshot - http://imgbox.com/AqJVAmyJ
Is there any other way you can think of, given the things we've analyzed above, that I could make this work the way I wanted? Any way to define projects at a level further under the source root?
The idea of projects is deeply embedded into OpenGrok. Maybe some of the guys who made Perforce work can comment ? @gtoph @nerakhon ?
Replied on the other thread as well, with that, here's how my projects look where each project is a different client spec. Mentioned it in other threads as well, but I have a main opengrok project directory and all I do is make a link (windows mklink /D) to the actual projects which can be located where ever on my drive.
Blacked out the project names just so I don't get in trouble. I don't know why it's listing the actual path below the project either, but that is probably something I did in my customization's. Haven't had a chance to look into it yet though and it wasn't a deal killer for the moment.
Interesting. Thank you. Though, noting you have symbolic links for perforce, and the real sources elsewhere, it makes me think that might be the better solution for me. But, that would also allow me to use just one client spec for perforce, where I had started. Are your projects actually in different perforce servers, or are you using a different client spec per project for some other reason?
(I don't have paths like that, but I'd guess that's because they're links. Just a guess though.)
They are all on the same server, but each project is a separate branch off the mainline and hence each has it's own client spec. We have some wrapper utilities around P4 to properly get a whole project setup (it's an ordeal :)) but more or less I sync whatever branches to something like D:\builds\project1
and then make a link to whatever projects for opengrok:
cd d:\opengrok\projectlinks
mklink /D project1 d:\builds\project1
With you having once client spec for everything, I would think the linking thing could work and 'trick' opengrok into showing multiple projects.
Yup. We just list all of the relevant projects and branches from our server via many Views lines in the client spec. Okay, I'll give that a shot. It would be easier to have one client spec, because then I can just do it with env variables, and not need to manage the .p4config files.
Thanks! More here after I give that a go, which will take hours or days to get things added and indexed. :-)
Ya I have projects that take over an hour to compile, so there's a lota files and indexing can take hours as well. I have a task scheduled every night to do it when I"m sleeping :)
Okay. I tried the one client spec into a perforce client root outside of the opengrok root, then symlinks into there from opengrok root. Unfortunately, that produced lots of the following when scanning for repositories at the start of indexing:
16:29:37 WARNING: Non-zero exit status 1 from command [p4, dirs, *] in directory /disk/opengroksources/projectD/src/stuff/libk32: Path '/disk/opengroksources/projectD/src/stuff/libk32/*/...' is not under client's root '/disk/perforce-sources'.
So, the root that is in the perforce client spec is still in effect, and it doesn't match the effective path opengrok is running in. I saw elsewhere that you mentioned using toRealPath() or the like, so maybe without that this trick won't work for perforce. Do you have that in your tree, @gtoph , that isn't in the main sources?
I'm not signed up to be a contributor so I don't have any official branch or anything but this is what I did:
public static History getRevisions(File file, String rev) throws IOException {
ArrayList<String> cmd = new ArrayList<String>();
Path realPath = file.toPath();
try{
realPath = realPath.toRealPath();
}catch (IOException e){}
file = realPath.toFile();
cmd.add("p4");
cmd.add("filelog");
cmd.add("-slti");
cmd.add(protectPerforceFilename(file.getName()) + PerforceRepository.getRevisionCmd(rev));
Executor executor = new Executor(cmd, file.getCanonicalFile().getParentFile());
executor.exec();
return parseFileLog(file, executor.getOutputReader());
}
Alright. Well, I'm going to close this for now. I have found that setting up with lots of different perforce client specs, I can get each repository checked out into a directory at the opengrok source root, and it works for me. I'd like to have a single or smaller number of perforce client specs, so I can perform fewer operations across larger sets of sources, but it seems not to work without effort like the code changes above, and I don't want to push for that now.
This is a question. I have a source tree, /disk/src, and under that are top-level directories for multiple (currently 12) projects from multiple source repositories. I am running opengrok indexer with
-s /disk/src -d /disk/data -H -P -S -G
. What the output from the console shows, and the log file, includes:It seems to be creating projects properly, when I view results in the U/I later, but. Are the log messages just misleading, or am I doing something wrong?
(side note, only some of the projects are Perforce, some are git, it seems to always say
/disk/src (PerforceRepository)
which is in a way the most concerning part of the incorrect log messages...)