src-d / borges

borges collects and stores Git repositories.
https://docs.sourced.tech/borges/
GNU General Public License v3.0
52 stars 20 forks source link

rovers + borges pipeline apparently does nothing for a while when started #326

Closed smola closed 6 years ago

smola commented 6 years ago

From @dennwc (more):

  • All the pipeline is running, but after few minutes nothing really happened - the log is silent, nothing is pulled to the specified folder.
    • Tried to restart services to see if they will catch up, but with no luck.
    • After few minutes of silence it started to pull repositories suddenly. Surprise! It should definitely write something to console, like "no jobs, waiting N minutes bofore checking again" or "fetching repository names". I'm not even sure what exactly happened.

This is probably a known issue. Not sure if it is a rovers or borges thing, but it would be worth to take a look and improve documentation and or logging.

erizocosmico commented 6 years ago

I'm gonna need more details about this. I can't reproduce it.

Steps I did:

dennwc commented 6 years ago

The scenario was the following:

erizocosmico commented 6 years ago

Aaaaaaaah. I know what happened to you then. The first N thousand repositories of bitbucket are only Mercurial, so it takes like an hour to start finding git repositories in bitbucket. It would say stuff on the logs if you use the debug log level:

DBUG[08-17|11:05:06] non git repository found                 repository=ubernostrum/django-profiles scm=hg caller=provider.go:94
DBUG[08-17|11:05:06] non git repository found                 repository=ubernostrum/django-contact-form scm=hg caller=provider.go:94
DBUG[08-17|11:05:06] non git repository found                 repository=ubernostrum/django-template-utils scm=hg caller=provider.go:94
DBUG[08-17|11:05:06] non git repository found                 repository=undees/blog scm=hg caller=provider.go:94
DBUG[08-17|11:05:06] non git repository found                 repository=aafshar/glashammer-main scm=hg caller=provider.go:94
DBUG[08-17|11:05:06] non git repository found                 repository=itkach/.emacs.d scm=hg caller=provider.go:94
DBUG[08-17|11:05:06] non git repository found                 repository=IndigoJo/qtm scm=hg caller=provider.go:94
DBUG[08-17|11:05:06] non git repository found                 repository=ionelmc/juicer-pylons scm=hg caller=provider.go:94
DBUG[08-17|11:05:06] non git repository found                 repository=tals3k/s3k scm=hg caller=provider.go:94
DBUG[08-17|11:05:06] non git repository found                 repository=leonidas/tre scm=hg caller=provider.go:94
DBUG[08-17|11:05:06] non git repository found                 repository=knepley/petscextstokessolvers scm=hg caller=provider.go:94
DBUG[08-17|11:05:06] non git repository found                 repository=THC4k/supybot-channellogger scm=hg caller=provider.go:94

Perhaps we should make that a warning instead We could also mention in the readme that bitbucket provider takes like an hour to start finding repositories.

Or we could get the ID of the first git repository in bitbucket and skip all the mercurial ones to start right away.

WDYT @ajnavarro?

dennwc commented 6 years ago

I haven't seen these logs, but it may be because rovers command had a higher log level.

smola commented 6 years ago

Perhaps we should make that a warning instead

A warning seems too much. Since we only look for git repositories, I guess skipping Mercurial repositories is quite irrelevant? Specially if there are thousands of those? But INFO could do.

We could also mention in the readme that bitbucket provider takes like an hour to start finding repositories.

:+1:

Or we could get the ID of the first git repository in bitbucket and skip all the mercurial ones to start right away.

Ugly, but it sounds reasonable, since we don't have any near future plans to support Mercurial.