woodpecker-ci / plugin-git

Woodpecker plugin for cloning Git repositories
https://woodpecker-ci.org/docs/usage/workflow-syntax#clone
Apache License 2.0
15 stars 25 forks source link

Clone failed: could not read Username for '$url': No such device or address #66

Closed Rubonnek closed 7 months ago

Rubonnek commented 1 year ago

Describe the bug

Say you have two pipelines defined where the first clones fine but takes about an hour to finish. The default clone operation fails on the second pipeline with the message:

fatal: could not read Username for '$url': No such device or address

when running against Forgejo v1.19 even when WOODPECKER_AUTHENTICATE_PUBLIC_REPOS=true is set.

I believe this issue is related to some timeout value on Forgejo's side since in its logs I get a 404 Unauthorized access message.

System Info The /version slug is not working for me against my server instance -- I'm getting a 404.

Version: next-3a475ce2 Image: docker.io/woodpeckerci/woodpecker-server:next Hash: 09e1c0597a92

Compose file example:

version: "3.8"
services:
  app:
    image: docker.io/woodpeckerci/woodpecker-server:next
    ports:
      - "<REDACTED>"
    environment:
      - WOODPECKER_OPEN=true
      - WOODPECKER_ADMIN=<REDACTED>
      - WOODPECKER_HOST=https://<REDACTED>
      - WOODPECKER_AGENT_SECRET=<REDACTED>
      - WOODPECKER_GITEA=true
      - WOODPECKER_GITEA_URL=https://<REDACTED>
      - WOODPECKER_GITEA_CLIENT=<REDACTED>
      - WOODPECKER_GITEA_SECRET=<REDACTED>
      - WOODPECKER_AUTHENTICATE_PUBLIC_REPOS=true
    volumes:
      - woodpecker:/var/lib/woodpecker/
volumes:
    woodpecker:

Additional Context

As a workaround I implemented my own clone step using an access token as the password instead with an alpine container. For example, cloning https://$USER:$ACCESS_TOKEN@forgejo.example/$ORG/$REPO works fine for me.

This is what I'm using specifically:

skip_clone: true

pipeline:
  clone:
    image: docker.io/alpine/git:latest
    secrets:
      - source: access-token-secret
        target: ACCESS_TOKEN
    commands:
      - git init -b $$CI_COMMIT_BRANCH
      # NOTE: Replace GIT_USER on the next line
      - MODIFIED_REPO_URL=$$(printf "%s\n" "$$CI_REPO_REMOTE" | sed -e "s|https://|https://GIT_USER:$$ACCESS_TOKEN@|g")
      - git remote add origin $$MODIFIED_REPO_URL
      - git fetch --no-tags --depth=1 --filter=tree:0 origin "+$$CI_COMMIT_REF"
      - git reset --hard -q $$CI_COMMIT_SHA
      - git submodule update --init --recursive
      - git lfs fetch
      - git lfs checkout
6543 commented 1 year ago

that has something to do with the crafted netrc config ...

6543 commented 1 year ago

... not sure what exactly go wrong through ... based on the info you provide

Sebclem commented 1 year ago

I have the same issue and i think i can add somme informations:

.woodpecker.yml

clone:
  git:
    image: woodpeckerci/plugin-git:2.0.3
    settings:
      recursive: false

pipeline:
  Check docker-compose files:
    image: docker/compose
    pull: true
    commands:
      - apk add --no-cache bash
      - bash ./test-all
    when:
      - event: "push"
        branch: [main, master]
      - event: [pull_request, manual, deployment]

  Deploy dockers:
    image: appleboy/drone-ssh
    pull: true
    settings:
      host: xxx.xxxx.xxx
      username: root
      key:
        from_secret: ansible_private_key
      port: 22
      command_timeout: 2h
      script:
        - cd /opt/docker-compose
        - git pull
        - ./deploy-all
    when:
      environment: production
      event: deployment

when:
  - event: "push"
    branch: [main, master]
  - event: [pull_request, manual, deployment]

With this file i have the same issue:

+ git init -b master
Initialized empty Git repository in /woodpecker/src/git.xxxxx.xxxx/sebclem/docker-vps/.git/
+ git remote add origin https://git.xxx.xxx/sebclem/docker-vps.git
+ git fetch --no-tags --depth=1 --filter=tree:0 origin +refs/heads/master:
fatal: could not read Username for 'https://git.xxxxx.xxxx': No such device or address
exit status 128

BUT, if I remove the clone part, it's work (I have another error cosed by a submodule, it's why I use recursive: false) :

.woodpecker.yml

pipeline:
  Check docker-compose files:
    image: docker/compose
    pull: true
    commands:
      - apk add --no-cache bash
      - bash ./test-all
    when:
      - event: "push"
        branch: [main, master]
      - event: [pull_request, manual, deployment]

  Deploy dockers:
    image: appleboy/drone-ssh
    pull: true
    settings:
      host: xxxxx.xxxxx.xxxxx
      username: root
      key:
        from_secret: ansible_private_key
      port: 22
      command_timeout: 2h
      script:
        - cd /opt/docker-compose
        - git pull
        - ./deploy-all
    when:
      environment: production
      event: deployment

when:
  - event: "push"
    branch: [main, master]
  - event: [pull_request, manual, deployment]

Output:

+ git init -b master
Initialized empty Git repository in /woodpecker/src/git.sebclem.fr/sebclem/docker-vps/.git/
+ git remote add origin https://git.xxxxx.xxxxx/sebclem/docker-vps.git
+ git fetch --no-tags --depth=1 --filter=tree:0 origin +refs/heads/master:
From https://git.xxxx.xxxx/sebclem/docker-vps
 * branch            master     -> FETCH_HEAD
 * [new branch]      master     -> origin/master
+ git reset --hard -q e0b90a4126347b6bbb0092588dc7ff56d91784d0
+ git submodule update --init --recursive
fatal: No url found for submodule path 'django_vache/django-vache' in .gitmodules
exit status 128

Edit: Forgot to add that before today, this was working great (last success: 2 days ago) I'm using woodpecker:next docker image for runner and server.

Rubonnek commented 1 year ago

BUT, if I remove the clone

Now that you mention it, I recall trying to configure the clone plugin like that and I stumbled upon the same issue.

In fact, I was able to reproduce the issue with just this:

clone:
  git:
    image: woodpeckerci/plugin-git:2.0.3
anbraten commented 1 year ago

Could be related to the change from #1352

Wojnr commented 1 year ago

Could be related to the change from #1352

Probably it is. After rollback to docker image before this pr almost everything works again.

Sebclem commented 1 year ago

As a workaround, you can disable this in the repository settings, it's work for me: image

patrickuhlmann commented 1 year ago

I do have the same issue (message fatal: could not read Username for '***': No such device or address). One thing I noticed is that it seems to work fine if only one pipeline is run at a time (which means that they can start immediately). As soon as jobs are queuing (which means that they start delayed) they run into this problem. I disabled (unchecked) "Only inject netrc credentials into trusted containers" but it still doesn't work.

I am also using the next version of woodpecker with Gitea/Forgejo and my step is configured like that:

clone:
  git:
    image: woodpeckerci/plugin-git
    environment:
      - PLUGIN_LFS=false
      - PLUGIN_SKIP_VERIFY=true

Also when I restart the pipeline later (without any change in configuration/any new commit) the build works successfully. I am pretty sure that the fact that more jobs are started than agents are available thus some jobs are delayed is the relevant factor.

patrickuhlmann commented 1 year ago

... not sure what exactly go wrong through ... based on the info you provide

what info would be useful to provide?

pat-s commented 1 year ago

Unchecking "Only inject netrc credentials into trusted containers" worked for me.

patrickuhlmann commented 1 year ago

I found this issue today by coincidence: https://github.com/gitkraken/vscode-gitlens/issues/1027. They report to having this problem when using HTTPS instead of SSH. I then saw in the log that the woodpeckerci/plugin-git indeed uses the html_url git remote add origin https://forgejo.***.ch/***/***.git.

I checked the content of the webhook in Forgejo. It contains both urls:

    "html_url": "https://forgejo.***.ch/***/***",
    "ssh_url": "git@forgejo.***.ch:***/***.git",

Maybe switching to SSH would make it more stable?

lafriks commented 1 year ago

Maybe switching to SSH would make it more stable?

you can use ssh only with users ssh key and woodpecker does not have it and should not have it either so that's not really an option

pat-s commented 1 year ago

Unchecking "Only inject netrc credentials into trusted containers" worked for me.

I wonder if this should be the default given how many people seem to face issues with it. And not all of them will arrive here and read through this issue?

lafriks commented 1 year ago

As this is only in development version and 1.0 will be breaking anyway, it's better to use secure by default

patrickuhlmann commented 1 year ago

Unchecking "Only inject netrc credentials into trusted containers" worked for me.

I wonder if this should be the default given how many people seem to face issues with it. And not all of them will arrive here and read through this issue?

I still have the issue even when I uncheck the option. Doesn't seem to be a "reliable workaround".

pat-s commented 8 months ago

@patrickuhlmann Do you still face this error with the latest plugin version and latest WP server? If so, could you post your setup in more detail and also what repo options are enabled/disabled?

patrickuhlmann commented 8 months ago

I updated all components and still face the same issue.

I run everything in docker containers on a Synology Diskstation. I have the following containers:

The error happens when I run renovate. This job is running more than one hours and occupies one runner. It creates many pull requests which in turn trigger builds on other repositories. These builds are running on the second runner. In the beginning everything works fine. After a while (when lots of jobs are queued up) they start to fail.

The job output is always like this:

+ git config --global http.sslCAInfo /opt/MyLAN.crt
+ git init -b master
Initialized empty Git repository in /woodpecker/src/forgejo.me.ch/My/repo.git/
+ git remote add origin https://forgejo.me.ch/My/repo.git
+ git fetch --no-tags --depth=1 --filter=tree:0 origin +refs/pull/108/head:
fatal: could not read Username for 'https://forgejo.me.ch': No such device or address
exit status 128

The configuration of the job(s) is:

Project settings
* Allow Pull Requests checked
* Trusted checked

Timeout
* 5min

All pipelines are similar. For example

clone:
  git:
    image: woodpeckerci/plugin-git
    environment:
      - PLUGIN_LFS=false
      - PLUGIN_CUSTOM_SSL_PATH=/opt/MeLAN.crt
    volumes:
      - /volume1/docker/woodpecker/MeLAN.crt:/opt/MeLAN.crt

pipeline:
  verify:
    image: gradle:8.1.0-jdk17-focal
    commands:
    - gradle assemble
    - gradle check
    volumes:
    - /volume1/docker/woodpecker/gradle:/root/.gradle

In the forgejo log I see

..rvices/auth/basic.go:130:Verify() [E] UserSignIn: user's password is invalid [uid: 1, name: patrick]
...s/auth/middleware.go:23:func1() [E] Failed to verify user: user's password is invalid [uid: 1, name: patrick]
 ...eb/routing/logger.go:102:func1() [I] router: completed GET /My/repo.git/info/refs?service=git-upload-pack for 172.17.0.9:0, 401 Unauthorized in 80.4ms @ auth/middleware.go:20(auth.Auth)

In the runner I see

ERR grpc error: wait(): code: Unknown: rpc error: code = Unknown desc = Step finished with exit code 1,  | error=rpc error: code = Unknown desc = Step finished with exit code 1, 
WRN cancel signal received | repo=My/repo pipeline=104 id=3156 error=rpc error: code = Unknown desc = Step finished with exit code 1, 

In the woodpecker logs, I see

 ip=172.17.0.3 latency=12411.027311 method=POST path=/hook status=500 user-agent=Go-http-client/1.1
ERR failure to save pipeline for My/repo | error=database is locked
ERR error=Error #01: failure to save pipeline for My/repo

One thing I am wondering is this "database is locked". Is this normal? Might the problem be that I am using an sqlite3 database?

pat-s commented 8 months ago

One thing I am wondering is this "database is locked". Is this normal? Might the problem be that I am using an sqlite3 database?

Likely, I've seen this error with sqlite3 in the past when there was too much load on the DB. And as you're saying, during the renovate run more runs spin up from the PRs opened by renovate which then likely overload the sqlite3 DB.

sqlite3 is usually only suitable for dev purposes, it's better to use postgres or mysql even for semi-production home use.

thechubbypanda commented 7 months ago

Hi all, I'm running into this problem with Gitea latest and woodpecker latest as of today. Checking/unchecking "Only inject netrc credentials into trusted containers" has no effect on the outcome of the job. I've also tried a brand new repository (private and public), with and without the checkbox above enabled. All the same.

patrickuhlmann commented 7 months ago

I switched to postgres but face still the same problem. Btw. in the end it would have surprised me as Sqlite is very much underestimated. It was already able to handle thousands of selects and inserts in a very short time even on multiple gigabytes large databases years ago (you will find plenty of benchmarks info if you search for it).

pat-s commented 7 months ago

@thechubbypanda can you verify the issue only exists in WP latest and not in WP 2.0.0? If so, maybe you can track it down further to a specific commit? All main branch commits have associated images.

patrickuhlmann commented 7 months ago

I am now running WP 2.0.0 and still have this issue

thechubbypanda commented 7 months ago

Ok so:

WP Server: next-1ca549190b OR v2.0.0 OR v1.0.5
WP Agent: next-1ca549190b OR v2.0.0 OR v1.0.5
plugin-git: latest

Gives the same error unless I overwrite the clone step:

+ git fetch --no-tags origin +master:
fatal: no path specified; see 'git help pull' for valid url syntax
exit status 128

Conclusion: Something else is wrong

pat-s commented 7 months ago

@thechubbypanda Thanks. Strange though, as many people use 2.0 + meanwhile and we haven't yet heard of more issues like this.

I also administrate multiple instances and haven't come across the issue in months.

Are you running WP in docker or via a host install?

And since you wrote

Hi all, I'm running into this problem with Gitea latest and woodpecker latest as of today.

Did it work before with an older version/different setup?

thechubbypanda commented 7 months ago

In terms of "today" I merely meant that I just tested it. Given that the issue is a few months old now.

I'm running dockerized at the moment.

I started spinning up a completely clean installation of both Gitea and woodpecker yesterday. Will report back if that works. Then it's a matter of narrowing down what setting or situation is causing the problem.

I will note that it's extremely tough to debug the docker images given they appear to not even have sh installed.

thechubbypanda commented 7 months ago

FOUND IT @pat-s: GITEA__service__REQUIRE_SIGNIN_VIEW: true

Set that to true and the error appears, set to false, it works as expected.

qwerty287 commented 7 months ago

Can you try to set https://woodpecker-ci.org/docs/administration/server-config#woodpecker_authenticate_public_repos to true?

pat-s commented 7 months ago

I will note that it's extremely tough to debug the docker images given they appear to not even have sh installed.

Yes, this is (partly) known. Even though my guess would have been that it is very unlikely that it is an issue in the image as otherwise we would have gotten many more reports here.

It is likely that WOODPECKER_AUTHENTICATE_PUBLIC_REPOS helps, it might to partly the same that GITEA__service__REQUIRE_SIGNIN_VIEW does.

@qwerty287 Maybe we should add a warning if Gitea is used as a forge and WOODPECKER_AUTHENTICATE_PUBLIC_REPOS is not true?

thechubbypanda commented 7 months ago

Is there a scenario where just having that enabled by default is a bad idea? Or at least inversing it?

qwerty287 commented 7 months ago

Maybe we should add a warning if Gitea is used as a forge and WOODPECKER_AUTHENTICATE_PUBLIC_REPOS is not true?

No, because that's only necessary if you require logins for everything. If GITEA__service__REQUIRE_SIGNIN_VIEW is false everything's working. Also, this can be the same for other forges too and is not gitea-specific

thechubbypanda commented 7 months ago

It is likely that WOODPECKER_AUTHENTICATE_PUBLIC_REPOS helps

Well regardless, that has fixed the issue for me at least. Thanks

pat-s commented 7 months ago

The error message is quote non-descriptive and it's not easy for users to find out how to solve the issue. Even when searching the main repo this is tricky as the issue is being discussed/reported here, and not all users will even arrive here.

I wonder how this can be better communicated - in the end, such details/issues lead to a bad user experience.

pat-s commented 7 months ago

Closing here now as the issue seems to be resolved and other users arriving here should find the solution.

Yet I think we should somehow assert GITEA__service__REQUIRE_SIGNIN_VIEW if that's the real underlying cause of this error?

6543 commented 7 months ago

So it was #25 all along :/

patrickuhlmann commented 6 months ago

For me I think the problem was weak hardware. As soon as I switched from my diskstation to a dedicated computer and forgejo as well as woodpecker run much faster the problem was gone as well.