pharo-project / pharo

Pharo is a dynamic reflective pure object-oriented language supporting live programming inspired by Smalltalk.
http://pharo.org
Other
1.21k stars 355 forks source link

Pharo CI stopped working - libgit errors #11481

Closed JanBliznicenko closed 2 years ago

JanBliznicenko commented 2 years ago

I am not exactly sure what is the source of the problem, but all Pharo CI/CD on Linux and Mac stopped working in last 24 hours. Win builds seem unaffected. I have seen different errors. Mostly IceGenericError: error reading from the zlib stream, but few times IceGenericError: bad packet length.

MetacelloNotification: Loaded -> BaselineOfUMLProfiles-CompatibleUserName.1658909608 --- https://github.com/openponk/uml-profiles.git[master] --- https://github.com/openponk/uml-profiles.git[master]
I got an error while cloning: There was an authentication error while trying to execute the operation: . 
This happens usually because you didn't provide a valid set of credentials. 
You may fix this problem in different ways: 

1. adding your keys to ssh-agent, executing ssh-add ~/.ssh/id_rsa in your command line.
2. adding your keys in settings (open settings browser search for "Use custom SSH keys" and
add your public and private keys).
IceGenericError: error reading from the zlib stream
IceLibgitErrorVisitor>>visitGenericError:
IceLibgitErrorVisitor>>visitERROR:
LGit_GIT_ERROR>>acceptError:
[ :error |
        location exists ifTrue: [ location ensureDeleteAll ].
        error acceptError: (IceLibgitErrorVisitor onContext: self) ] in IceGitClone>>execute in Block: [ :error |...
FullBlockClosure(BlockClosure)>>cull:
Context>>evaluateSignal:
Context>>handleSignal:
LGit_GIT_ERROR(Exception)>>signal
LGit_GIT_ERROR class(LGitCallReturnHandler class)>>signalWith:
LGitReturnCodeEnum>>handleLGitReturnCode
LGitRepository(LGitExternalObject)>>withReturnHandlerDo:
LGitRepository>>clone:options:to:
LGitRepository>>clone:options:
[location ensureCreateDirectory.

    repo := LGitRepository on: location.
    cloneOptions := repo cloneOptionsStructureClass withCredentialsProvider: (IceCredentialsProvider defaultForRemoteUrl: url).

    "Keeping references, because if not the GC take them."
    checkoutOptions := cloneOptions checkoutOptions.
    callbacks := cloneOptions fetchOptions callbacks.
    callbacks transferProgress: IceGitTransferProgress new.

    checkoutOptions checkoutStrategy: LGitCheckoutStrategyEnum git_checkout_force.
    checkoutOptions progressCallback: IceGitCheckoutProgress new.

    repo clone: url options: cloneOptions.

    (LGitRemote of: repo named: 'origin')
        lookup;
        setUrl: url.

    ] in IceGitClone>>execute in Block: [location ensureCreateDirectory....
FullBlockClosure(BlockClosure)>>on:do:
IceGitClone>>execute
IceRepositoryCreator>>cloneRepository
[
        self validate.
        self isCloning
            ifTrue: [ self cloneRepository ]
            ifFalse: [ self addLocalRepository ] ] in IceRepositoryCreator>>createRepository in Block: [...
FullBlockClosure(BlockClosure)>>on:do:
IceRepositoryCreator>>createRepository
[ repository := builder createRepository ] in MCGitHubRepository(MCGitBasedNetworkRepository)>>createIcebergRepositoryFor: in Block: [ repository := builder createRepository ]
FullBlockClosure(BlockClosure)>>on:do:
MCGitHubRepository(MCGitBasedNetworkRepository)>>createIcebergRepositoryFor:
[ ^ self createIcebergRepositoryFor: urlToUse ] in MCGitHubRepository(MCGitBasedNetworkRepository)>>createIcebergRepositoryWithFallbackFor:url: in Block: [ ^ self createIcebergRepositoryFor: urlToUse ]
FullBlockClosure(BlockClosure)>>on:do:
MCGitHubRepository(MCGitBasedNetworkRepository)>>createIcebergRepositoryWithFallbackFor:url:
[ | remote |
            remote := IceGitRemote url: remoteUrl.
            self createIcebergRepositoryWithFallbackFor: remote url: remoteUrl ] in MCGitHubRepository(MCGitBasedNetworkRepository)>>getOrCreateIcebergRepository in Block: [ | remote |...
OrderedCollection(Collection)>>detect:ifFound:ifNone:
OrderedCollection(Collection)>>detect:ifNone:
MCGitHubRepository(MCGitBasedNetworkRepository)>>getOrCreateIcebergRepository
3. using HTTPS instead SSH (Just use an url in the form https://etc.git)./ I will try to clone the HTTPS variant.

Error with status code 1:
653 travis_wait /home/runner/.smalltalkCI/helpers.sh

Source of the log: https://github.com/OpenPonk/class-editor/runs/7543017828?check_suite_focus=true

Probably related to https://github.com/pharo-vcs/iceberg/issues/1600 and https://github.com/hpi-swa/smalltalkCI/issues/562

badetitou commented 2 years ago

It looks like it works now for Moose build ( we have a failing test. But it is our fault. Still, the baseline is executed correctly)

JanBliznicenko commented 2 years ago

It seems the original problem is fixed. Linux builds of all my repos started working. I have different errors on Mac (Pharo 10) builds now:

PrimitiveFailed: primitive #primLoadSymbol:module: in TFFIBackend failed
TFFIBackend(ProtoObject)>>primitiveFailed:
TFFIBackend(ProtoObject)>>primitiveFailed
TFFIBackend>>primLoadSymbol:module:
TFFIBackend>>loadSymbol:module:
ExternalAddress class>>loadSymbol:module:
TFExternalFunction>>validate
TFSameThreadCall>>executeOn:withArguments:
TFSameThreadRunner>>invokeFunction:withArguments:
LGitLibrary>>libgit2_version:minor:rev:
TFCalloutAPI(FFICalloutAPI)>>function:library:
TFCalloutAPI(FFICalloutAPI)>>function:module:
LGitLibrary(FFILibrary)>>ffiCall:
LGitLibrary>>libgit2_version:minor:rev:
LGitLibrary>>version
LGitLibrary>>isVersionLessThan:
LGitRepository>>cloneOptionsStructureClass
[location ensureCreateDirectory.

    repo := LGitRepository on: location.
    cloneOptions := repo cloneOptionsStructureClass withCredentialsProvider: (IceCredentialsProvider defaultForRemoteUrl: url).

    "Keeping references, because if not the GC take them."
    checkoutOptions := cloneOptions checkoutOptions.
    callbacks := cloneOptions fetchOptions callbacks.
    callbacks transferProgress: IceGitTransferProgress new.

    checkoutOptions checkoutStrategy: LGitCheckoutStrategyEnum git_checkout_force.
    checkoutOptions progressCallback: IceGitCheckoutProgress new.

    repo clone: url options: cloneOptions.

    (LGitRemote of: repo named: 'origin')
        lookup;
        setUrl: url.

    ] in IceGitClone>>execute in Block: [location ensureCreateDirectory....
FullBlockClosure(BlockClosure)>>on:do:
IceGitClone>>execute
...
guillep commented 2 years ago

Yes, the OSX problem is separate, check the issue here: https://github.com/pharo-project/pharo/issues/11561 PRs are issued, I'm waiting that the CI runs to check it's ok.

gcotelli commented 2 years ago

@guillep can we get a new Pharo 10 release once all the related issues are merged? I want to update our docker images for Pharo but prefer to base it on a tagged version (like v10.1.0) and not a commit hash.

tesonep commented 2 years ago

@gcotelli The version v10.0.1 is ready

Rinzwind commented 2 years ago

@guillep @tesonep I just saw there was another ‘IceGenericError: bad packet length’ on our Jenkins server last night, so I was wondering what the status of this issue is with respect to Pharo 9? Is an update still coming?

Info from PharoDebug.log:

THERE_BE_DRAGONS_HERE
IceGenericError: bad packet length
31 August 2022 12:35:17.203046 am

VM: unix - x86_64 - linux-gnu - CoInterpreter * VMMaker-tonel.1 uuid: 365973b2-49a3-0d00-90e4-5907092bce84 Aug 23 2022
StackToRegisterMappingCogit * VMMaker-tonel.1 uuid: 365973b2-49a3-0d00-90e4-5907092bce84 Aug 23 2022
v9.0.17 - Commit: 9e4879f - Date: 2022-08-22 14:31:22 +0200

Image: Pharo9.0.0 [Build information: Pharo-9.0.0+build.1575.sha.9bb5f998e8a6d016ec7abde3ed09c4a60c0b4551 (64 Bit)]
JanBliznicenko commented 2 years ago

Actually, I still randomly see errors as well even for Pharo 10 on Mac. At least it is "sometimes randomly", and not every time like before. For example, yesterday I got: IceGenericError: SecureTransport error: connection closed via error in https://github.com/OpenPonk/plugins/runs/8118540172?check_suite_focus=true

guillep commented 2 years ago

Hi @Rinzwind , the issue is still not backported to Pharo9. There is PR https://github.com/pharo-project/pharo/pull/11596 on hold, but it requires some more work. There is probably a mismatch between Pharo9 and the version of NewTools that is trying to be installed.

@JanBliznicenko Can you tell us if the problem persists? having connection errors is something that happens...

JanBliznicenko commented 2 years ago

@guillep Unfortunately it seems so. It is random, so MUCH less often than before, but it used to be completely ok before all these GitHub-related problems started. This is another one, from today: https://github.com/OpenPonk/fsm-editor/runs/8152818612?check_suite_focus=true

GitHub
Nightly · OpenPonk/fsm-editor@f2c71fd
Finite-state machine diagrams for OpenPonk tool. Contribute to OpenPonk/fsm-editor development by creating an account on GitHub.
tesonep commented 2 years ago

Hi @JanBliznicenko, to minimize the noise maybe a good alternative is to put as preInstall script:

Iceberg remoteTypeSelector: #httpsUrl

In this way it will just try to use HTTPS and don't try to use SSH. Now, it tries with SSH and if it fails retries with HTTPS. I am not sure if that will work better, but it will reduce the noise in the error.

Rinzwind commented 2 years ago

Might be useful to someone else: to avoid the problem on our Jenkins server, which uses Debian, I now extended our build scripts to install the package ‘libgit2-1.1’, and to apply a patch to smalltalkCI like the one given below. The output then shows LGitLibrary uniqueInstance version = #(1 1 0).

diff --git a/pharo/run.sh b/pharo/run.sh
index c35c456..6dde247 100644
--- a/pharo/run.sh
+++ b/pharo/run.sh
@@ -334,6 +334,25 @@ pharo::run_script() {
 # Load project into Pharo image.
 ################################################################################
 pharo::load_project() {
+  pharo::run_script "
+    LGitLibrary compile: 'unix64LibraryName
+
+      \"Patched to try libgit2.so.1.1 first, see: https://github.com/pharo-project/pharo/issues/11481\"
+
+      ^ FFIUnix64LibraryFinder findAnyLibrary: #(
+        ''libgit2.so.1.1''
+        \"This name is wrong, but some versions of the VM has this library shipped with the bad name\"
+        ''libgit2.1.0.0.so''
+        ''libgit2.so.1.0.0''
+        ''libgit2.so.1.0''
+        ''libgit2.so.1.2''
+        ''libgit2.so.0.25.1'')'.
+    Smalltalk snapshot: true andQuit: true
+  "
+  pharo::run_script "
+    Transcript show: 'LGitLibrary uniqueInstance version = ' , LGitLibrary uniqueInstance version asString; cr.
+    Smalltalk snapshot: true andQuit: true
+  "
   pharo::run_script "
     | smalltalkCI |
     $(conditional_debug_halt)
JanBliznicenko commented 2 years ago

Hi @JanBliznicenko, to minimize the noise maybe a good alternative is to put as preInstall script:

Iceberg remoteTypeSelector: #httpsUrl

In this way it will just try to use HTTPS and don't try to use SSH. Now, it tries with SSH and if it fails retries with HTTPS. I am not sure if that will work better, but it will reduce the noise in the error.

Yes, that looks much better now, I have been postponing doing something with those warnings for years and it is actually simpler than I thought :)

Unfortunately, it does not solve the problem I have. It seems really Mac-only now. It happens for me about 50 % of the time. https://github.com/OpenPonk/class-editor/runs/8175042543?check_suite_focus=true https://github.com/OpenPonk/fsm-editor/runs/8175046734?check_suite_focus=true https://github.com/OpenPonk/OpenPonk-BPMN/runs/8175061785?check_suite_focus=true

guillep commented 2 years ago

@Pablo is working on issue #1612, to retry cloning automatically if there is a connection problem. That should be (hopefully) the last issue required here, at least for some time :).

Rinzwind commented 2 years ago

@guillep I’m not sure you linked to the right issue (1612 in this repository is a pull request, ‘add window tiling shortcuts’). Edit: I hadn’t noticed the right issue is actually mentioned right above your message (so: https://github.com/pharo-vcs/iceberg/issues/1612).

One question I had here still: in LGitLibrary>>#unix64LibraryName, shouldn’t the library versions be ordered from highest to lowest? Otherwise, if say only libgit2.so.1.0 and libgit2.so.1.1 can be found, v1.0 is used while it would be better to use v1.1? In LGitLibrary>>#macLibraryName, less versions are given, but they are ordered from highest to lowest.

guillep commented 2 years ago

@guillep I’m not sure you linked to the right issue (1612 in this repository is a pull request, ‘add window tiling shortcuts’). Edit: I hadn’t noticed the right issue is actually mentioned right above your message (so: pharo-vcs/iceberg#1612).

Oups, yes, different repositories :) I'll fix the link

One question I had here still: in LGitLibrary>>#unix64LibraryName, shouldn’t the library versions be ordered from highest to lowest? Otherwise, if say only libgit2.so.1.0 and libgit2.so.1.1 can be found, v1.0 is used while it would be better to use v1.1? In LGitLibrary>>#macLibraryName, less versions are given, but they are ordered from highest to lowest.

Yes, I think so!

tesonep commented 2 years ago

I have integrated a fix for the OSX problem with the connection, it is integrated into P11, later we are going to do a release of Iceberg and integrated it into P10. This version should improve the problem with OSX, as it will retry if there is a network issue.

guillep commented 2 years ago

Hi all, is this problem finally fixed?

badetitou commented 2 years ago

I still havbe problem with Pharo 9. But everything is ok for Pharo 10

badetitou commented 2 years ago

Hmmm.. I'll check again and send you the trace if one

JanBliznicenko commented 2 years ago

@guillep All my builds use Pharo 10 and are on Win, Linux and Mac and all are completely fine lately, thank you.

Rinzwind commented 2 years ago

I have not seen either of the two errors anymore (and the workaround for our build scripts has been removed from the scripts).

guillep commented 2 years ago

Thank you all!

@badetitou yes, a more concrete case would help, because

I'll close it. We can reopen a new case if needed.

JanBliznicenko commented 1 year ago

So, it seems sometimes I still get the error for my biggest projects. Like this: https://github.com/OpenPonk/class-editor/actions/runs/3321581366/jobs/5489424282 It is quite rare though, like once in 30 runs and it seems to happen only on Mac now.

GitHub
Nightly · OpenPonk/class-editor@74e0098
Contribute to OpenPonk/class-editor development by creating an account on GitHub.