pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.32k stars 636 forks source link

Key error from lookup in symlink_map in ivy_resolve.py #1747

Closed ericzundel closed 9 years ago

ericzundel commented 9 years ago

Periodically we see a KeyError raised from pants (although the exception type isn't printed). We see this in our build of the world script that crawls the repo and runs pants commands on every directory.

We've see this happen with different artifacts. The solution is to remove the directory from under ~/.ivy2/pants/

The first problem is that there is very little information reported.

--------- idea foo:: -----------
Running Pants version square-20150616-01
INFO] Detected git repository at /Users/square/Jenkins/workspace/pants-check-idea on branch None

14:42:44 00:00 [main]
               (To run a reporting server: ./pants server)
14:42:44 00:00   [bootstrap]
14:42:48 00:04   [setup]
14:42:48 00:04     [parse]
               Executing tasks in goals: bootstrap -> imports -> unpack-jars -> deferred-sources -> gen -> resolve -> idea
14:42:50 00:06   [bootstrap]
14:42:50 00:06     [bootstrap-jvm-tools]
14:42:50 00:06   [imports]
14:42:50 00:06     [ivy-imports]
14:42:54 00:10   [unpack-jars]
14:42:54 00:10     [unpack-jars]
14:42:54 00:10   [deferred-sources]
14:42:54 00:10     [deferred-sources]
14:42:54 00:10   [gen]
14:42:54 00:10     [thrift]
14:42:54 00:10     [protoc]
14:42:54 00:10     [antlr]
14:42:54 00:10     [ragel]
14:42:54 00:10     [jaxb]
14:42:54 00:10     [wire]
14:42:54 00:10   [resolve]
14:42:54 00:10     [ivy]
14:42:55 00:11   [complete]
               FAILURE
Exception message: '/Users/square/.ivy2/pants/joda-time/joda-time/jars/joda-time-2.8.jar'

The exception stacktrace is:

Exception caught:
  File "/Users/square/.cache/pants/bin/pants/pants-square-20150616-01.pex/.bootstrap/_pex/pex.py", line 271, in execute
    self.execute_entry(entry_point, args)
  File "/Users/square/.cache/pants/bin/pants/pants-square-20150616-01.pex/.bootstrap/_pex/pex.py", line 320, in execute_entry
    runner(entry_point)
  File "/Users/square/.cache/pants/bin/pants/pants-square-20150616-01.pex/.bootstrap/_pex/pex.py", line 343, in execute_pkg_resources
    runner()
  File "/Users/square/.cache/pants/bin/pants/pants-square-20150616-01.pex/pants/bin/pants_exe.py", line 81, in main
  File "/Users/square/.cache/pants/bin/pants/pants-square-20150616-01.pex/pants/bin/pants_exe.py", line 74, in _run
  File "/Users/square/.cache/pants/bin/pants/pants-square-20150616-01.pex/pants/bin/goal_runner.py", line 218, in run
  File "/Users/square/.cache/pants/bin/pants/pants-square-20150616-01.pex/pants/bin/goal_runner.py", line 275, in _do_run
  File "/Users/square/.cache/pants/bin/pants/pants-square-20150616-01.pex/pants/engine/engine.py", line 26, in execute
  File "/Users/square/.cache/pants/bin/pants/pants-square-20150616-01.pex/pants/engine/round_engine.py", line 212, in attempt
  File "/Users/square/.cache/pants/bin/pants/pants-square-20150616-01.pex/pants/engine/round_engine.py", line 45, in attempt
  File "/Users/square/.cache/pants/bin/pants/pants-square-20150616-01.pex/pants/backend/jvm/tasks/ivy_resolve.py", line 153, in execute

Here is the code from that rev:

147:        artifact_paths = []
148:        for artifact in ivy_info.get_artifacts_for_jar_library(target, memo=ivy_jar_memo):
149:          if artifact.path in symlink_map:
150:            key = artifact.path
151:          else:
152:            key = os.path.realpath(artifact.path)
153:          artifact_paths.append(symlink_map[key])

In this case, the version of joda-time was updated from 2.8 to 2.8.1 the day before. Our repo only references revision 2.8.1

./3rdparty/BUILD.gen:    sjar(org='joda-time', name='joda-time', rev='2.8.1',)

The ivy cache contains:

ls -l /Users/square/.ivy2/pants/joda-time//joda-time/jars/joda-time-2.*
-rw-r--r--  1 square  staff  589289 Jan 11 16:51 /Users/square/.ivy2/pants/joda-time//joda-time/jars/joda-time-2.7.jar
-rw-r--r--  1 square  staff  621931 Jun 15 07:17 /Users/square/.ivy2/pants/joda-time//joda-time/jars/joda-time-2.8.1.jar
-rw-r--r--  1 square  staff  621926 May 29 08:56 /Users/square/.ivy2/pants/joda-time//joda-time/jars/joda-time-2.8.jar

The symlink cache contains:

ls -l .pants.d/ivy/jars/joda-time/joda-time/jars/
total 8
lrwxr-xr-x  1 square  staff  70 Jun 26 13:11 joda-time-2.8.1.jar -> /Users/square/.ivy2/pants/joda-time/joda-time/jars/joda-time-2.8.1.jar

I turned on -ldebug and didn't see much (I've saved a copy in ~/pants/idea-ivy-resolve-key-error.log on my laptop)

Of note, the cache here has a lot of old internal-.xml files and a lot of 'resolved-.xml' files.

~/.ivy2/pants find . -name 'resolved*xml' | xargs grep -i joda-time | wc -l
    1391

The reports are showing version 2.8.1 properly resolved. That would explain why it isn't in the symlink map. I'll follow up after modifying pants to get more debugging information.

ericzundel commented 9 years ago

I added an explicit exception in this case

FAILURE: Jar IvyArtifact(path='/Users/square/.ivy2/pants/joda-time/joda-time/jars/joda-time-2.8.jar', classifier=None) in 3rdparty:joda-time.joda-time not resolved to the ivy symlink map in conf default.

and debugging to print out the name of the the ivy report:

DEBUG] Parsing ivy report /Users/square/.ivy2/pants/internal-da71ce64e94a8ba548d1ba3b68c0103b301e0206-default.xml
DEBUG] Parsing ivy report /Users/square/.ivy2/pants/internal-da71ce64e94a8ba548d1ba3b68c0103b301e0206-sources.xml
DEBUG] Parsing ivy report /Users/square/.ivy2/pants/internal-da71ce64e94a8ba548d1ba3b68c0103b301e0206-javadoc.xml

This report has joda-time at 2.8:

<caller organisation="internal" name="da71ce64e94a8ba548d1ba3b68c0103b301e0206" conf="default" rev="2.8" rev-constraint-default="2.8" rev-constraint-dynamic="2.8" callerrev="latest.integration"/>
                                <artifacts>
                                        <artifact name="joda-time" type="jar" ext="jar" status="no" details="" size="621926" time="0" location="/Users/square/.ivy2/pants/joda-time/joda-time/jars/joda-time-2.8.jar">
                                                <origin-location is-local="false" location="https://nexus.corp.squareup.com/content/groups/public/joda-time/joda-time/2.8/joda-time-2.8.jar"/>
                                        </artifact>
                                </artifacts>

But our 3rdparty/BUILD file has the right version:

jar_library(name='joda-time.joda-time',
  jars = [
    sjar(org='joda-time', name='joda-time', rev='2.8.1',)
  ],
)

Check the date on this report:

$ ls -l /Users/square/.ivy2/pants/internal-da71ce64e94a8ba548d1ba3b68c0103b301e0206-default.xml
-rw-r--r--  1 square  staff  219161 Jun 23 15:43 /Users/square/.ivy2/pants/internal-da71ce64e94a8ba548d1ba3b68c0103b301e0206-default.xml

compared to the date on when the change was made to the source of the dependency version (we generate our BUILD files from pom.xml files):

3b3b002ee9 parents/base/pom.xml (xxxx xxxxx                     2013-04-22 16:10:49 -0700  457)       <dependency>
bc0fff6715 base-pom.xml         (xxxxx xxxxxxx                 2012-07-11 22:19:35 -0700  458)         <groupId>joda-time</groupId>
bc0fff6715 base-pom.xml         (xxxxx xxxxxxx                 2012-07-11 22:19:35 -0700  459)         <artifactId>joda-time</artifactId>
c582b660f8 parents/base/pom.xml (xxxxx xxxxxxxxxx              2015-06-24 14:33:43 -0700  460)         <version>2.8.1</version>
bc0fff6715 base-pom.xml         (xxxxxx xxxxxxx                 2012-07-11 22:19:35 -0700  461)       </dependency>
ericzundel commented 9 years ago

We wrap JarDependency with our own plugin 'sjar()' that provides global excludes.

https://github.com/ericzundel/mvn2pants/blob/master/src/python/squarepants/plugins/sjar/exclude_globally.py

It is a subclass of JarDependency, so I thought it would be OK, but I wonder if that is messing up the ivy caching?

I changed the sjar() to just jar() and it made no difference - pants is still trying to use the old ivy report.

mateor commented 9 years ago

I don't think your plugin is the problem. I have seen this error many times when moving android off codegen and then adding JarDependency to the new unpack_libraries code and have even done some (mostly fruitless) digging myself.

ericzundel commented 9 years ago

I think I know what the problem is based on the evidence above and reading the invalidation code in IvyTaskMixin. These ivy files are named with the sha of all the target names. It is depending on the invalidation logic to determine when to re-resolve and overwrite the file with updates. But invoking another target could cause ivy to fetch a new version of a jar_library. If this is done after a clean-all, the old symlink will be gone . Then going back to the original target there is an old report file that matches the name and all of the targets are valid, but the report was generated before one of the revs was modified on one of the jars.

That is my theory, but I haven't been able to reproduce it yet. I think the worst part of this bug is that it may NOT fail, but leave some targets silently linking against the wrong version of a library.

On Sat, Jun 27, 2015, 5:18 PM Mateo Rodriguez notifications@github.com wrote:

I don't think your plugin is the problem. I have seen this error many times when moving android off codegen and then adding JarDependency to the new unpack_libraries code and have even done some (mostly fruitless) digging myself.

— Reply to this email directly or view it on GitHub https://github.com/pantsbuild/pants/issues/1747#issuecomment-116144315.

ericzundel commented 9 years ago

Note that I've visited a similar problem before, see https://rbcommons.com/s/twitter/r/2015/, github issue #1350

I'm still working to reproduce this issue outside of our environment. But based on what I'm seeing, I think the solution to it is to change the way the sha is computed for the name of the internal-<sha>-<conf>.xml files which would cause us to re-run ivy because resolved-internal-<sha>-latest-integration.xml file would be missing in this case:

https://github.com/pantsbuild/pants/blob/ec9dcc0799180bc9c8a99fae2e53f5c678cf8176/src/python/pants/backend/jvm/tasks/ivy_task_mixin.py#L142

ericzundel commented 9 years ago

In our tree we have 3 targets that are closely related that exhibit the problem. I have a script that will reproduce the problem in our repo.

The logic of the script is essentially:

rm all internal files under ~/.ivy2/pants
./pants clean-all
# edit 3rdparty/BUILD and set joda-time version to 2.8
./pants resolve.ivy target-a::
./pants clean-all
# edit 3rdparty/BUILD and set joda-time version to 2.8.1
./pants resolve.ivy target-b::
./pants resolve.ivy target-c::
./pants resolve.ivy target-a:: # Reproduces the problem

I have tried creating similar logic using the pants repo and thus far not succeeded in reproducing it there.

Here is a patch that will change the way the ivy sha is computed:

diff --git a/src/python/pants/backend/jvm/ivy_utils.py b/src/python/pants/backend/jvm/ivy_utils.py
index c6e19f5..a958e53 100644
--- a/src/python/pants/backend/jvm/ivy_utils.py
+++ b/src/python/pants/backend/jvm/ivy_utils.py
@@ -247,11 +247,7 @@ class IvyUtils(object):

   @staticmethod
   def identify(targets):
-    targets = list(targets)
-    if len(targets) == 1 and targets[0].is_jvm and getattr(targets[0], 'provides', None):
-      return targets[0].provides.org, targets[0].provides.name
-    else:
-      return 'internal', Target.maybe_readable_identify(targets)
+    return 'internal', Target.invalidation_hashes(list(targets))

   @classmethod
   def xml_report_path(cls, targets, conf):
diff --git a/src/python/pants/base/target.py b/src/python/pants/base/target.py
index 297cedc..d66c726 100644
--- a/src/python/pants/base/target.py
+++ b/src/python/pants/base/target.py
@@ -166,6 +166,11 @@ class Target(AbstractTarget):
     return cls.maybe_readable_combine_ids([target.id for target in targets])

   @staticmethod
+  def invalidation_hashes(targets):
+    """Generates an invalidation hash for a collection of targets."""
+    return hash_all([target.invalidation_hash() for target in sorted(targets)])
+
+  @staticmethod
   def combine_ids(ids):
     """Generates a combined id for a set of ids."""
     return hash_all(sorted(ids))  # We sort so that the id isn't sensitive to order.

The idea behind the cache is that this will incorporate the version number requested by the JarDependency since the names are computed from the invalidation hash, not just the target names.

This fixes the problem in my repro case, but its going to lead to a proliferation of internal*.xml and resolved-internal-*(properties|xml) files in the ivy cache dir. To solve that we might move these files out of the ivy cache and into .pants.d after running ivy resolve. This adds complexity and would lead to more work during a clean-all, so I'm not sure how good of an idea that is.

I'm still looking for a repro case outside of our repo. The 'target-a' I used is very complex:

./pants depmap samsa/bridge | wc -l
 1564169
ericzundel commented 9 years ago

I committed the debugging in issue #1748 (commit aedb181).

So, in order to trigger this bug, we have to have 3 things 1:

      if report_missing or invalidation_check.invalid_vts or not os.path.exists(raw_target_classpath_file):
        args = ['-cachepath', raw_target_classpath_file_tmp] + (custom_args if custom_args else [])
        self.exec_ivy(
            ...)

1) There must be a resolved-internal*.xml report in ~/.ivy2/pants/ for the same set of target ids (run before a change of jar revision and before ./pants clean-all)

2) The same jars must have already been resolved in ivy resolve so that global_vts.invalid_vts is an empty list

3) There must be an existing raw_target_classpath file in the .pants.d directory. The sha for this is derived from the cache_key created from the cache_keys for all the VersionTargetSets in global_vts.

I can understand how we might trick 1) by resolving the jars before the version change. I understand how we can trick 2) by re-resolving a superset of the targets for target-a in the example above.

I'm working on how to trick 3)

[1] https://github.com/pantsbuild/pants/blob/aedb18147d85b85d2b4cc535ae61ee0bf587b9b3/src/python/pants/backend/jvm/tasks/ivy_task_mixin.py#L145

ericzundel commented 9 years ago

See https://rbcommons.com/s/twitter/r/2435/

Fixed in commit Commits dd78698ca and 7b07716