Open Salamek opened 6 years ago
Hi @Salamek, no anger from me. Thanks for sharing. That’s a pretty cool project.
Though if you were concerned about upsetting someone (me or anybody), I recommend a more neutral approach as to the challenges of the existing project rather than saying it’s “wrong”.
In any case, it is probably worth mentioning your project in my README after I try out your instructions and validate your project works.
How do you handle projects in which you have no control to set up webhooks? For example, let’s say you want to mirror an upstream project like the Linux Kernel.
@samrocketman Hi, yea you are right about that, i modified it to be less agressive. Mirrors where you have no control to set webhook can by mirrored via celerybeat (something like integrated cron into application) currently it is disabled in config and no mentions in doc, but it is there. I may add option to set cron like syntax to project to by mirrored via celery beat and not by hook.
Or you can click on button "Trigger sync" in row of project you want to sync in mirror overview
Or you can call that webhook from cron via curl :smile:
I created issue for it https://github.com/Salamek/gitlab-tools/issues/4, i will look on it ASAP
I created gitlab-mirrors to specifically mirror readonly projects which is why it runs on cron. I imagine much of the user base of gitlab-mirrors uses it for this purpose or has upgraded to GitLab Enterprise for this feature.
Note: I imagine because I did not survey anybody but that’s basically the only use case gitlab-mirrors is meant to solve.
If you ever implement a cron-like mirror capability in your software, then I highly recommend using some kind of task/worker queue with thread locking and parallelism. cron definitely has limits and imagine trying to mirror 10,000 repositories.
The eventlet and greenlet libraries are pretty good for parallelism.
@samrocketman i use celery for background tasks in Gitlab-tools, celery supports sheduler (implementig it right now)
@samrocketman hi, Issue https://github.com/Salamek/gitlab-tools/issues/4 (Periodical sync) solved in version 1.0.13, PullMirror: New optional field where user can specifiy a cron expression to run mirror periodicaly:
@Salamek thanks for sharing. When @samrocketman created this, his intention was probably to solve a problem in the simplest way possible. Pretty sure he never intended this project to last thing long and be as popular as it is. Cron and bash is simple though, and is way more reliable than having a daemon listening on a port. Getting all the python stuff working is not ideal. However, it just works once setup and has been working for the last 4 years for me.
I welcome anything that can replace this and make it better. A few years ago I added a mirror list functionality with another cron job to easily sync new mirrors.
I don't think adding a database and webserver is warranted in your project. That adds many new security risks, maintenance, dependencies and an entire webstack vs just having bash, python and cron. KISS.
I am for remaking this tool, as it is showing signs of age. It should be in all ruby (or all python) though.
Maybe some of the ex-Githubbers will create such a tool.
👍 to trying to make gitlab-mirrors as simple as possible. Honestly, I thought GitLab would eventually release mirroring support built-in. They did eventually but only for the EE version unfortunately. So instead, community efforts I kind of split on this repository mirroring topic.
If only mirroring was a part of GitLab CE 🤔 .
Now a days, I mostly use gitlab-mirrors to keep offline copies of my GitHub profile. ref: https://github.com/samrocketman/github-backups ; that one is in Ruby :). If I start using GitLab again I'll probably get an itch to overhaul this project.
Btw, I work in a secure environment and standing up new web services is frowned upon. So I have special needs. Thus bash and cron are easy to get going and don't require regulatory approval. Personally, I think Gitlab should build this into Gitlab and make the worker queue distributed so we can designate workers that have internet access and be able to mirror from public git repos.
M$ just bought Github. Expect more users to gitlab.
@logicminds
apt
and psql
commands far more simple than setting up this project + having gitlab-tools installed as apt package is more maintainable and UPDATABLE solution. Personaly i dont like having 3rd party non packaged software running on my server, it makes ppl skip updates and finally forgot they had that software installed in first place.PS: It would be great if GitLab team just added pull mirror functionality to Gitlab CE, i was partially hoping that releasing gitlab-tools will "force" them to do that.
Not saying your stuff isn't awesome as it sounds pretty sweet already. I just have unique needs that most people don't care about.
@logicminds
&& 4. That is iteresting and working solution. But i would still go with web UI (since i wanted a gitlab-tools to have sync logs so ppl can see what went wrong with mirror task - just like Gitlab CI :smiley_cat: )
I only support python3 for whole applications and python2/3 for libraries (like https://github.com/Salamek/cron-descriptor) so no python2 support in gitlab-tools, if someone needs python2 support badly, they will need to create an issue and buy me a beer (or 10) because that will need lot of "useless" work :smiling_imp: getting os with python3 in repos seems like cheaper solution.
@samrocketman BTW, issue #85 solved in gitlab-tools 1.0.15 And thats it for today...
@Salamek awesome. It's nice of you to account for needs from people who have opened issues with this project.
@logicminds
Multi-threaded syncing would be nice when syncing the 50+ repos we have setup.
Can be easily parallelized with xargs.
Example:
ls -1 "${repo_dir}/${gitlab_namespace}" | xargs -n1 -P0 -I '{}' ./update_mirror.sh '{}' >> "${git_mirrors_dir}/cron.log" 2>&1
The log will likely look pretty ugly due to the parallelism. Probably better to add a logging option to update_mirror.sh
which would utilize mktemp
for outputting the log and flock
to coordinate writing to cron.log
in order.
@logicminds if you wanted to trigger a gitlab-mirror sync on merge you could make use of authorized_keys
command to launch a script (instead of allowing a shell) when a specific SSH key connects. See the authorized_keys man page.
Hi, i hope this will not make someone angry posting this here. For a while i was using this project, but in the end it was inappropriate for the job:
So i created my ~much better project, doing same stuff properly and more:
Only disadvantage is, that it requires working database server:
Url to project with more info and some docs is on https://github.com/Salamek/gitlab-tools