klaussilveira / gitlist

An elegant and modern git repository viewer
https://gitlist.org/
BSD 3-Clause "New" or "Revised" License
2.92k stars 520 forks source link

Proposal: Specifying multiple repository directories #215

Closed wimrijnders closed 11 years ago

wimrijnders commented 11 years ago

I have a situation on my development machine, where there are many git repositories arranged in various locations in the file system. I would like to be able to specify the repositories I would like to see and would really prefer to use a single instance of gitlist to do this.

I hereby propose to allow multiple values for parameter repositories in config.ini. The syntax can follow the current syntax of php ini files:

repositories[]=/path/to/repo1
repositories[]=/path/to/another/repo
repositories[]=/path/to/yet/another/interesting/repo

The current single value assignment can still be supported, for simple cases and backwards compatibility.

The amount of extra code to make this work is zero. The syntax is already supported by PHP.

What does need to change, is the way that the paths are managed internally. In the new situation, the values will be stored as an array in $app['git.repos']. The most important calls in the application using this value are:

$repository = $app['git']->getRepository($app['git.repos'] . $repo);

$this->app['git']->getRepositories($this->app['git.repos'])

The second call is easy to adjust for arrays. The first call needs to change; however, the change can be minimal:

$repository = $app['git']->getRepository($app['git.repos'],  $repo);

(dot replaced by comma)

Special handling for array can then be hidden in method getRepository. A scalar string value will also be accepted, this is a matter of testing is the parameter is an array and running the current code if it is not.

Routing.php does some internal handling as well of $app['git.repos']. This should be not too hard to fix.

I appreciate feedback on this idea. I am entirely willing to make the changes myself. Adding this feature to gitlist would make it extremely more useful for me, and I presume also for others.

Regards,

Wim.

attiks commented 11 years ago

I think this is a brilliant idea and lots of people are looking for this, so I would say: "make it happen" ;-)

DannyvdSluijs commented 11 years ago

I'm all for aswell. Im having work related repositories in another directory, than my personal repositories.

On Nov 29, 2012, at 15:11 , attiks notifications@github.com wrote:

I think this is a brilliant idea and lots of people are looking for this, so I would say: "make it happen" ;-)

— Reply to this email directly or view it on GitHub.

wimrijnders commented 11 years ago

That's 2 out of 2. Thanks for the support, guys.

How many more people should respond positively before I can definitely start? I would say all developers of gitlist, how many would that be?

attiks commented 11 years ago

I think it's save to start, but you canwait for @klaussilveira's feedback if you want to be sure.

wimrijnders commented 11 years ago

Oh, Klaus was happy with it. Before posting the issue I mailed him privately and he agreed it was a good idea.

So I can go ahead, I guess? OK, good, first the paying work out of the way....

On Thu, Nov 29, 2012 at 6:54 PM, attiks notifications@github.com wrote:

I think it's save to start, but you canwait for @klaussilveirahttps://github.com/klaussilveira's feedback if you want to be sure.

— Reply to this email directly or view it on GitHubhttps://github.com/klaussilveira/gitlist/issues/215#issuecomment-10858694.

Met vriendelijke groet,

Wim Rijnders http://axizo.nl

wimrijnders commented 11 years ago

Hi all,

Just letting you know I started work on this issue. I can now display the index page just fine with multiple root directories, I'm now moving on to repo-specific views. This is where I'm boggin down now.

Can anyone help me with the following?

This is from index.twig:

<a href="{{ path('repository', {repo: repository.relativePath}) }}">{{ repository.name }}</a>

I'm not familiar at all with twig. Can someone tell me what the call to path() does, and what the first parameter 'repository' stands for. I'm guessing it's the base directory to which the relativePath is added, if so, where is it defined?

The problem here is that there may now be multiple root directories, so the whole construction is moot. My thought is to chuck this relative path construction overboard and switch to absolute paths.

I hope someone can answer the question and think along with me as how to handle multiple roots.

Thanks in advance,

Wim.

wimrijnders commented 11 years ago

I discovered during coding that there is a separate module under the vendor dir called 'gitter'. This has a separate repo.

This is to let you know, especially Klaus, that I forked that repo as well, because it is necessary to change the code there as well for multiple input roots.

wimrijnders commented 11 years ago

Wrt. previous: never mind, I figured it out. I got a working setup now, need to clean it up.

One thing that confused me is that $app['git']-> can actually access different objects, depending on which method is called.

I'm caching the repository list within the Client.php of the gitter subproject. I'm not sure if this is valid architecture-wise, but it is more efficient than recreating it on every call to getRepositories(), and it solves some access problems.

I also ditched the relative paths used, because they don't make sense any more. Repo's are now accessed by name in the URL. One thing I have to think about, is avoiding name collisions, eg. git projects in different paths but having the same name for the subdirectory. This can be overcome.

Going the right way. My fork is up to date if you care to have a look. Just keep in mind the work is not complete yet.

klaussilveira commented 11 years ago

I'll take a look at your forks asap and comment where necessary. Thanks for the hard work, @wimrijnders!

wimrijnders commented 11 years ago

Hello gitlist people,

This is to inform you I got the gitlist multiple repository directories working to satisfaction. To be honest, I had most of it down three days after I started, but sadly work got in the way. So now, to treat myself after recovering my blade server from a succesful webbot assimilation, I decided to sit down and go the 100%.

The final bit took about an hour and mostly involved testing all possible paths through the controller. Looking good!

Basically, what changed in the code at the Controller level, is that the lines:

        $repository = $app['git']->getRepository($app['git.repos'] . $repo);

Got replaced by:

        $repotmp = $app['git']->getRepositoryCached($app['git.repos'], $repo);
        $repository = $app['git']->getRepository($repotmp->getPath());

Some refactoring would be in order, but the functionality is now there.

Things outstanding:

That's it for now, regards,

Wim.

wimrijnders commented 11 years ago

Hello there,

Wrt multiple directories, I did an extensive test involving 3 repository directories in the config.ini, containing in total 31476 subdirectories, more than 11 levels deep, in which 33 git repositories resided at indeterminate locations.

I am happy to say it works beautifully as intended.....however, each page load took 25 seconds (admittedly, it was a stress test), because the repository search runs on every page fetch.

While most people will not configure such an extensive directory structure ( repositories= '/', anyone?), I think it is possible to make this a lot faster by caching the results of the search for git repositories.

Unfortunately, I'm not that well-versed with Symfony to see how I can make this work. Could I request anyone who's reading this and willing to help, to brainstorm along a bit?

Considerations:

I would very much appreciate to hear your ideas.

Thanks,

Wim.

wimrijnders commented 11 years ago

Would now be a good time to issue a pull request? Despite the previous comment about using caching to increase performance, I mean.

tobya commented 11 years ago

Do you have a zip with all the necessary changes? eg the change to gitter and to gitlist? I'd love to try it out, I've got a smaller setup so could give some feedback. Or you can point me to the branches on github I need to sync.

wimrijnders commented 11 years ago

Hey great, a response!

Grab it from: https://github.com/wimrijnders/gitlist

You will also need to get: https://github.com/wimrijnders/gitter. This goes under relative dir 'vendor/klaussilvera'. Just make sure you get the subdir's properly aligned before copying.

Would love to hear your feedback.

tobya commented 11 years ago

Hi

After a bit of figuring out how to pull down, I managed to merge your changes to my local repo as a branch and got it working.

The first thing to note, is that I'm on windows, so there are some problems with paths. I've fixed the first one replacing \ in paths with \\ in Routing->getRepositoryRegex.

I'll go through and fix issues as I find them for windows. Looks good so far though.

wimrijnders commented 11 years ago

Great that you're looking into the issues. Indeed, I am running linux and /-\ annoyances are to be expected...

How can I get the changes that you make into my fork, so I can test them as well?

tobya commented 11 years ago

Working well now, found another change I needed to make. You've hard coded / in a few places, I will change these to use DIRECTORY_SEPARATOR. I had previously submitted a pull request to update the config example to allow windows paths and updates to gitter and gitlist to allow running on windows.

I can push up my merged branch to github and submit a pull request to your branch when I'm done.

wimrijnders commented 11 years ago

Hey tobya,

You're really on the move! Great that you're putting in the effort.

Wrt. Directory separator, I'd much rather automate the detection of it instead of specifying a config option. Eg. check out this solution: http://stackoverflow.com/questions/6654157/how-to-get-a-platform-independent-directory-separator-in-php

I was really under the impression that the /-s would be handled fine in a platform-independent manner by PHP; alas, it is not to be......

Will await your pull request.

wimrijnders commented 11 years ago

I had a few questions about your implementation, is this the right place to discuss it or should we move it to your github repo?

By all means discuss it here in Klaus's gitlist, so that there will be more eyeballs regarding the question.

tobya commented 11 years ago

Hi Wim,

I made the pull requests. Basic changes to work on Windows.

Here are some ideas.

  1. Rather than listing all repos as a big long list, would it be good to list each set of repo's under a title of the base repository directory they are in. This was it might be easier to find a repository you want.
  2. It is quite slow :) As well as a cache do you think its worth letting people select if they want gitlist to check subdirectories, or just the top level?
wimrijnders commented 11 years ago

Hi Toby,

  1. The gitlist project itself, with its multiple repositories, is the prime example of this of course. I guess that would work, but it means changing the way the gui works and I'm a bit wary of doing that myself.
  2. Yes, the same idea has crossed my mind. You could add a flag per path to indicate that you shouldn't search further into subdirectories once a repository has been found. Taking it from the example config, something like:

    ; If recurse specified at end of path, keep on searching for further repositories in subdirectories after a repository has been found. ; If not specified, don't search further. repositories[] = '/var/www/projects/;recurse'

'Slow' is a matter of perspective; from a technical viewpoint, I find it amazing that the code manages to return within 30 seconds after searching a 100,000+ component directory structure (ref testcase my machine). I think that PHP is doing a pretty awesome job here. From a user viewpoint, of course, this delay sucks and I can well imagine them getting fed up despite the enhanced functionality.

Here's a performance improvement strategy: If you have the recursion blocking, you can set up your repository directories so that they point directly to the repositories you are interested in. eg:

repositories[] = '/direct/path/to/repo1/'
repositories[] = '/direct/path/to/repo2/'
repositories[] = '/path/to/gitlist/;recurse'        ; contains repo's lower down, so search further

This should improve performance to no end, because searching stops right at the specified path.

In other words, your idea of making the recursive search user-definable is a damn good one, only the user needs to be aware of this option.

Would like to hear your opinion on the 'recurse' option as specified above. If you have better ideas, please let me know.

tobya commented 11 years ago

Of course you are right performance is great for what it is doing, but I think it is worth investigating other options.

Rather than adding the word recurse maybe we should use subdir or /s to make it more consistent with command line params.

repositories[] = '/direct/path/to/repos/'
repositories[] = '/direct/path/to/repos2/'
repositories[] = '/path/to/gitlist/;/s'        ; contains repo's lower down, so search further

Also I would suggest that it always search 1 level down, so if you have a lot of repos at

repositories[] = '/direct/path/to/repos/'

you don't need to specify them all just the root dir. The line above would find

'/direct/path/to/repos/myrepo1/'
'/direct/path/to/repos/myrepo2/'
'/direct/path/to/repos/myrepo3/'

However

Perhaps we are making this all too complicated, and really what we need is to just wait until we can implement some sort of cache of the repos (which after all won't change very often) and a Refresh button that the user can click.!

wimrijnders commented 11 years ago

Hi Toby,

/s: the only thing I have against this syntax, is that it is less readable than 'recurse'. I would much rather prefer to use more verbose statements which make it more intuitive what is happening.

search 1 level down: If you define the recursion rule as follows:

'search all subdirectories until a git repository is encountered. If no recursion is specified, don't look in the subdirectories of found repository'.

... then all repo's in you example will be found. The search will continue, just not deeper in the branch of a found repo. So eg. the repo at '/direct/path/to/repos/myrepo1/' will be found, as will myrepo2 and myrepo3, but not the eventual repo at '/direct/path/to/repos/myrepo1/other/repo'.

caching and refresh button: I'm thinking hard about this right now. A 'Refresh' button is probably the easiest solution, should the directory structure change without the ini-file changing.

wimrijnders commented 11 years ago

Toby,

I got a basic caching of repo's working. Please pull my forks (geez, that sounds corny for some reason).

Apart from the initial load, pages now take fractions of seconds to load, major improvement.

But now, I see some error I'm pretty sure I didn't have previously:

I'm not sure at all if this is related to the caching; perhaps I missed them on the previous testing. I can't figure out the connection, and I will leave it for now.

Further points:

wimrijnders commented 11 years ago

Toby,

FYI, the previously named RuntimeExceptions check out and have nothing to do with the caching.

RuntimeException: fatal: Not a valid object name :(nobranch)/: the two repo's with this error indeed had '(no branch)' selected as branch.

RuntimeException: fatal: failed to read object aca09ef: Invalid argument: This was a corrupted repo which I had parked in a temp directory.

So all is well. I hope the caching works fine under windows. On my Ubuntu it basically rocks.

tobya commented 11 years ago

Works great!

No problems on windows. A nice simple solution.

Errors I think its worth keeping a list of types of repo's that cause errors as these could cause problems for other users. Perhaps anything that is found should be added as issues. I found others such as having a # or a space in a repo name causing errors (not allowed by github but allowed by git)

wimrijnders commented 11 years ago

:+1: Awesome! A good end of the year. Now I'm going to concentrate on the imminent year's end. Given a reasonable revalidation period, I expect to be functioning nominally by 3rd of january.

The errors you mention sound like something that should be added to the automated testing. It wouldn't hurt, of course, to document them. The gitlist wiki would be a good location.

Have a great and happy new year!

klaussilveira commented 11 years ago

Is your codebase stable and safe to merge? I'm eager to do it.

wimrijnders commented 11 years ago

Hi Klaus,

Nice of you to get in touch. My apologies that I neglected to communicate and that I left the work as is.

Yes, the code is stable. The last thing I added is caching of the repository list and now gitlist is blazing fast as well (except for the first time initialisation, of course).

Two things that need to be done, at least that I am aware of:

Neither of these options is a showstopper if not implemented. However, I think it is a good idea to implement at least the first option. The second one can wait.

If you don't mind, I'll add the refresh, when I have a free moment. As you may have guessed, I'm hard pressed for time, otherwise I would have finished it long ago.

Hope you don't mind waiting just a little longer. I'll do my best. I intend to put the final work in this coming weekend.

Kind Regards,

Wim.

On Thu, Jan 17, 2013 at 5:20 PM, Klaus Silveira notifications@github.comwrote:

Is your codebase stable and safe to merge? I'm eager to do it.

— Reply to this email directly or view it on GitHubhttps://github.com/klaussilveira/gitlist/issues/215#issuecomment-12375238.

Met vriendelijke groet,

Wim Rijnders http://axizo.nl

klaussilveira commented 11 years ago

I'm curious about the caching feature. Are you persisting on the filesystem? The repository contents as well?

I have added you as a collaborator so you can work freely on the features you have implemented. It would be nice if you could separate them into branches so we can easily maintain and evolve each one without any kind of mess.

wimrijnders commented 11 years ago

As previously, a scan is performed on the directory specified in the config file. Only, now there can be multiple directories specified, and all of these are scanned.

What is cached, is a list of all the git repo's found. This is written to the cache directory as present in gitlist. The entries of this file are the contents of the repo list structure as used in the program, saved in JSON format.

What happens afterwards, is that on every get to gitlist a check is performed on initialisation, to see if this cached list is older than the config file. If yes, then the contents of the cached file are loaded directly. So you don't need to perform the directory scan ever time you access a page on gitlist.

No, the repo contents are not cached, just the basic repo information (name, description and most importantly the location in file system).

Branching is a good idea, if anything to merge the code in a responsible manner. I hope that eventually you will agree that the changes for multiple root directories can be integrated into the main branch.

Regards,

Wim.

On Thu, Jan 17, 2013 at 8:38 PM, Klaus Silveira notifications@github.comwrote:

I'm curious about the caching feature. Are you persisting on the filesystem? The repository contents as well?

I have added you as a collaborator so you can work freely on the features you have implemented. It would be nice if you could separate them into branches so we can easily maintain and evolve each one without any kind of mess.

— Reply to this email directly or view it on GitHubhttps://github.com/klaussilveira/gitlist/issues/215#issuecomment-12385472.

Met vriendelijke groet,

Wim Rijnders http://axizo.nl

klaussilveira commented 11 years ago

There were various changes to the master branch since you forked, though. Maybe you should take a look on what has been changed and if it affects your code or not. Also, regarding cache, we can stat() the repositories directory, and if the modification date is newer than the cache file, we purge it.

wimrijnders commented 11 years ago

I'll do a pull to merge with main before committing the new multi branch.

stat: does that work if directories several levels deep are changed? If so, that may be an idea. But I just did a test, changes in deep directories are not registered.

In addition, I believe that the stat will register change on every file and dir change. I think it may trigger a bit too often to be useful. In my case, I use gitlist for repositories that I actually work on, not just central repositories on a server. So, by definition almost, files and dir's will change often.

I would think a refresh option is a safer bet.

Regards,

Wim.

On Thu, Jan 17, 2013 at 8:59 PM, Klaus Silveira notifications@github.comwrote:

There were various changes to the master branch since you forked, though. Maybe you should take a look on what has been changed and if it affects your code or not. Also, regarding cache, we can stat() the repositories directory, and if the modification date is newer than the cache file, we purge it.

— Reply to this email directly or view it on GitHubhttps://github.com/klaussilveira/gitlist/issues/215#issuecomment-12387912.

Met vriendelijke groet,

Wim Rijnders http://axizo.nl

wimrijnders commented 11 years ago

Klaus,

Something I just realized. To do it right, I will also need access to the gitter repository which is under the vendor folder, since there are code changes there as well. Would you mind arranging this?

Thanks,

Wim.

klaussilveira commented 11 years ago

Sure, i'll add you as a collaborator there as well.

wimrijnders commented 11 years ago

Thanks!

Wim.

On Fri, Jan 18, 2013 at 2:14 PM, Klaus Silveira notifications@github.comwrote:

Sure, i'll add you as a collaborator there as well.

— Reply to this email directly or view it on GitHubhttps://github.com/klaussilveira/gitlist/issues/215#issuecomment-12421188.

Met vriendelijke groet,

Wim Rijnders http://axizo.nl

wimrijnders commented 11 years ago

Klaus,

I added my changes in branch 'multidir'.

As the final change, I added a 'Refresh' option in the menu top-right. This works, except for one detail, perhaps you can help me with this.

In MainController, I added:

       $route->get('/refresh', function() use ($app ) {
            $app['git']->deleteCached();

            # TODO: Fix following to return to referring page
            #       Or get rid of hardcoded path
            return $app->redirect( "/gitlist/");
        })->bind('refresh');

What I would really like to do here, is redirect back to the referring page, instead of going to home. Only, I can't figure out how to do this in Symfony. Perhaps you know how to do this?

I hope this change is to satisfaction.

Regards,

Wim.

klaussilveira commented 11 years ago

In Symfony, it's simpler, since you can get the referrer from the session. In Silex, though, the only way i know is:

$app->redirect($request->headers->get('Referer'));
wimrijnders commented 11 years ago

That worked. Thank you, Klaus.

I see that I was confused about which component does the request handling. I'll have to read up on Silex a bit more.

So, now the multidir addition is done. Awaiting your feedback.

wimrijnders commented 11 years ago

I see that the unit test is failing. I will fix that today,

tobya commented 11 years ago

I'll see if it all works ok on windows

wimrijnders commented 11 years ago

Unit tests now fully passed for gitlist as well as Gitter! Phew!

Now is a good time to have a look, Klaus.

klaussilveira commented 11 years ago

After reviewing some files, it would be good if you could make the new code compliant with PSR-2. Also, have the changes been made to the latest master?

wimrijnders commented 11 years ago

Klaus,

Please direct me to a PSR-2 specification you use.

None of the code changes have been folded back to the master branch. This is intentional, I kept the code separate until everybody, especially you, is happy with it.

klaussilveira commented 11 years ago

We try to follow PSR-1 and PSR-2:

https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-1-basic-coding-standard.md https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-2-coding-style-guide.md

wimrijnders commented 11 years ago

OK, will get this in order. I'll go through the code and adjust for the PSR guidelines.

klaussilveira commented 11 years ago

Is it ready to be merged? :)

wimrijnders commented 11 years ago

Klaus,

I have reviewed gitlist and gitter and have made the php code conformant to PSR-1 and PSR-2 to the best of my abilities.

The files in which I have made changes, have been fully reviewed. In addition, I have made guideline changes in any other php files when I encountered guideline violations. This was a consequence of using recursive grep's to find the problem areas.

A small note; according to the PSR-0 guidelines:

So, to fully conform, your top-level namespace should be named 'klaussilvera' or suchlike. I have not changed this.

Hope this is of use. For the record, this was about 3 hours of work. I sincerely hope there is an automation tool for checking conformity to the PSR guidelines; I do not want to do this by hand again.

I'm aware that there are some issues within the code that could use some attention. The current code, however, should work just fine.

She's all yours, Klaus :-).

Kind Regards,

Wim.

wimrijnders commented 11 years ago

I just saw two mails from Travis-cl come in with Subject titles of the form:

[Errored] klaussilveira/gitlist#217 (multidir - a22760d)

For the record, I hereby state that both gitlist and gitter pass the locally run unittests perfectly.

klaussilveira commented 11 years ago

Travis sometimes bumps into a problem or two with composer or something, i'll check the tests manually. Thanks for the hard work, @wimrijnders! Regarding PSR tools, there's this:

http://pear.php.net/package/PHP_CodeSniffer

And this:

https://github.com/fabpot/PHP-CS-Fixer