ml4ai / skema

SKEMA: Scientific Knowledge Extraction and Model Analysis
https://ml4ai.github.io/skema/
Other
10 stars 4 forks source link

[artifactory] timeout retrieving glove-840b-300d-10f-kryo-1.0.0.jar #230

Closed myedibleenso closed 1 year ago

myedibleenso commented 1 year ago

Occasionally, builds of the text reading project fail because of we encounter a timeout when retrieving a dependency from the CLU Lab's artifactory:

> [22/28] RUN sbt assembly:
FetchError$DownloadingArtifacts: Error fetching artifacts:
#27 272.1 [error] https://artifactory.clulab.org/artifactory/sbt-release/org/clulab/glove-840b-300d-10f-kryo/1.0.0/glove-840b-300d-10f-kryo-1.0.0.jar: download error: Caught java.net.ConnectException (Connection timed out (Connection timed out)) while downloading https://artifactory.clulab.org/artifactory/sbt-release/org/clulab/glove-840b-300d-10f-kryo/1.0.0/glove-840b-300d-10f-kryo-1.0.0.jar
#27 272.1 [error] Total time: 249 s (04:09), completed Jun 8, 2023 4:32:41 AM
------
Dockerfile:56
--------------------
  54 |     WORKDIR /skema/text_reading
  55 |     # Compile the fat jar
  56 | >>> RUN sbt assembly
  57 |     
  58 |     # =============================================================================
--------------------
ERROR: failed to solve: process "/bin/sh -c sbt assembly" did not complete successfully: exit code: 1
Error: buildx failed with: ERROR: failed to solve: process "/bin/sh -c sbt assembly" did not complete successfully: exit code: 1

This is a very big JAR. Perhaps this particular problem will go away when we make the move to the transformers-based version of processors (@MihaiSurdeanu ).

enoriega commented 1 year ago

The issues with artifactory are known. I believe it is because of problems with river's hardware, which are not easy to fix

myedibleenso commented 1 year ago

We should explore alternatives (ex. migrating to artifactory to a different machine with newer hardware).

kwalcock commented 1 year ago

I'm not sure why glove is the usual problem maker. These files are even larger:

    "org.clulab"                 %% "epidemiology-embeddings-model-ser"   % "1.0.0",
    "org.clulab"                  % "spaceweather-model-unigram-ser"      % "1.0.0",

This additional copy of glove seems unnecessary. It's at least taking up bandwidth and disk space. I should check it out.

    "org.clulab"                  % "glove-840b-300d"                     % "0.1.0" % Test,
kwalcock commented 1 year ago

These other storage options seem defunct:

https://index.scala-lang.org/ohnosequences/sbt-s3-resolver
https://github.com/tpunder/fm-sbt-s3-resolver

https://github.com/lightbend/sbt-google-cloud-storage
myedibleenso commented 1 year ago

We haven't run into this issue recently. Closing for now...