konveyor / tackle2-addon-analyzer

Apache License 2.0
0 stars 11 forks source link

No way to recover after git clone failure #110

Open rszwajko opened 1 month ago

rszwajko commented 1 month ago

All actions that require fetching a big repository fail for me from time to time when triggered on slower/less stable internet connection. Moving closer to the WIFI router solves the problem in my case but for large scale deployments network bandwidth will always be a limiting factor. We should make the process more robust/optimized and/or provide a high level recovery (i.e. by re-starting an action).

The problem

The failure usually leaves logs similar to the following:

- '[CMD] Running: /usr/bin/git clone https://github.com/sonatype/nexus-public /addon/source/nexus-public'
    - '> Cloning into ''/addon/source/nexus-public''...'
    - '> error: 6783 bytes of body are still expected'
    - '> fetch-pack: unexpected disconnect while reading sideband packet'
    - '> fatal: early EOF'
    - '> fatal: fetch-pack: invalid index-pack output'
    - '> '
    - '[CMD] /usr/bin/git failed: exit status 128'

Please compare full logs from language discovery task on slow and fast internet (both for Nexus app). normal_internet.yaml.txt slow_internet.yml.txt

Improving the fetch process

Quick look at similar problems on StackOverflow suggests few approaches i.e. cloning a single branch with no history

git clone --single-branch --depth=1 git@my_server_url.com:my_repo_name

On Nexus app this makes a big difference:

$ git clone https://github.com/sonatype/nexus-public
Cloning into 'nexus-public'...
remote: Enumerating objects: 263481, done.
remote: Counting objects: 100% (16689/16689), done.
remote: Compressing objects: 100% (6756/6756), done.
remote: Total 263481 (delta 7155), reused 16290 (delta 6850), pack-reused 246792
Receiving objects: 100% (263481/263481), 154.08 MiB | 2.62 MiB/s, done.
Resolving deltas: 100% (110895/110895), done.

# vs

$ git clone --single-branch --depth=1 https://github.com/sonatype/nexus-public
Cloning into 'nexus-public'...
remote: Enumerating objects: 10556, done.
remote: Counting objects: 100% (10556/10556), done.
remote: Compressing objects: 100% (7418/7418), done.
remote: Total 10556 (delta 4117), reused 6246 (delta 1782), pack-reused 0
Receiving objects: 100% (10556/10556), 13.99 MiB | 2.83 MiB/s, done.
Resolving deltas: 100% (4117/4117), done.

High level recovery

Right now user cannot directly restart the actions. Existing workarounds are not convenient:

  1. for language and tech recovery - re-creating or editing the app
  2. for analysis - going again through the wizard (no way to use existing config)

The ideal solution would be to allow re-starting (or re-creating a new action with the same config) from the Task Manager.

konveyor-ci-bot[bot] commented 1 month ago

This issue is currently awaiting triage. If contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance. The triage/accepted label can be added by org members.