Closed wisechengyi closed 5 years ago
Summarizing the high-level idea on the bug: we have two processes with the same mem contents for uuids, tmp dirs and times to the millis - this implies a fork after those mem contents have been set.
One theory is that because #6748 switched multiprocessing pool to thread pool, the issue became more apparent, whereas the forked process just died silently before without affecting the parent process.
Added below to help debug:
diff --git a/src/python/pants/stats/statsdb.py b/src/python/pants/stats/statsdb.py
index e5ec166cc..62908e2da 100644
--- a/src/python/pants/stats/statsdb.py
+++ b/src/python/pants/stats/statsdb.py
@@ -77,6 +77,10 @@ class StatsDB(object):
with self._cursor() as c:
ri = stats['run_info']
try:
+ import os
+ import time
+ with open('/home/workspae/pants_local/abc', 'a') as f:
+ f.write('{} {} {}\n'.format(ri['id'], os.getpid(), time.time()))
c.execute("""INSERT INTO run_info VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
[ri['id'], int(float(ri['timestamp'])), ri['machine'], ri['user'],
ri['version'], ri['buildroot'], ri['outcome'], ri['cmd_line']])
Got
pants_run_2018_11_17_00_05_08_488_340963ceac4741c2ad7a8d8527486a53 38285 1542413120.62
> pants_run_2018_11_17_00_05_27_170_229832ad0095448a9d6a8d451921bd02 38743 1542413139.09
> pants_run_2018_11_17_00_05_27_170_229832ad0095448a9d6a8d451921bd02 38543 1542413140.41
pants_run_2018_11_17_00_05_46_661_43718262bcc4481fb1c43181358069c9 38801 1542413162.5
pants_run_2018_11_17_00_06_08_418_634c85b2de534c70a7d9aa65f3803034 39078 1542413184.38
pants_run_2018_11_17_00_06_30_278_aa633c18948d42bb9ca033ce90c89129 39355 1542413203.51
pants_run_2018_11_17_00_06_48_651_e4b041f0b50f4b299eaaa6520f71c781 39632 1542413219.53
pants_run_2018_11_17_00_07_05_71_4f9994b898074650b87bcb3b61e94b93 39892 1542413235.79
pants_run_2018_11_17_00_07_20_996_f4bdc02970c945c680e203f31eea313a 40152 1542413251.89
pants_run_2018_11_17_00_07_37_175_0589be6722d94dadb939dc47e8513d3a 40412 1542413267.38
pants_run_2018_11_17_00_07_52_675_9d6ffd5d631a4c56b947d5fd489e0a04 40659 1542413284.27
pants_run_2018_11_17_00_08_09_730_c8cee87d5611404280f13db900df2905 40919 1542413301.12
pants_run_2018_11_17_00_08_26_270_f580b7705c32417995f9019884b5fb19 41179 1542413317.04
pants_run_2018_11_17_00_08_43_104_f64aa32af3f94b3daf93733235e8ec1c 41439 1542413334.93
pants_run_2018_11_17_00_09_00_617_d24dea438a2a4bd6a3165aee76bca5d5 41697 1542413359.18
pants_run_2018_11_17_00_09_25_954_ead4f5e041624e89962209ad471214ae 42009 1542413382.21
pants_run_2018_11_17_00_04_40_615_8331facfc1064cf0999863b840062a43 38188 1542413382.74
Note the lines with >
, the run ids are identical. Since it was printing the current pid right before writing to db, it was quite obvious that the process was forked.
was able to repro the issue locally as below once every few runs. The special flake is [scaladoc]
which has funky behavior with multiprocessing pool.
root@46e71dd0c530:/home/workspae/pants_local# rm -rf /tmp/123 .pants.d/build_invalidator/ && ./pants publish.jar --local=/tmp/123 --no-dryrun --no-prompt --force testprojects/src/scala/org/pantsbuild/testproject/publish/classifiers
06:59:54 00:00 [main]
(To run a reporting server: ./pants server)
06:59:55 00:01 [setup]
06:59:55 00:01 [parse]
Executing tasks in goals: bootstrap -> imports -> unpack-jars -> deferred-sources -> gen -> jvm-platform-validate -> resolve -> resources -> compile -> doc -> jar -> publish
06:59:56 00:02 [bootstrap]
06:59:56 00:02 [substitute-aliased-targets]
06:59:56 00:02 [jar-dependency-management]
06:59:56 00:02 [bootstrap-jvm-tools]
06:59:56 00:02 [provide-tools-jar]
06:59:56 00:02 [imports]
06:59:56 00:02 [ivy-imports]
06:59:56 00:02 [unpack-jars]
06:59:56 00:02 [unpack-jars]
06:59:56 00:02 [deferred-sources]
06:59:56 00:02 [deferred-sources]
06:59:56 00:02 [gen]
06:59:56 00:02 [antlr-java]
06:59:56 00:02 [antlr-py]
06:59:56 00:02 [jaxb]
06:59:56 00:02 [protoc]
06:59:56 00:02 [ragel]
06:59:56 00:02 [thrift-java]
06:59:56 00:02 [thrift-py]
06:59:56 00:02 [wire]
06:59:56 00:02 [avro-java]
06:59:56 00:02 [go-thrift]
06:59:56 00:02 [go-protobuf]
06:59:56 00:02 [jax-ws]
06:59:56 00:02 [scrooge]
06:59:56 00:02 [thrifty]
06:59:56 00:02 [jvm-platform-validate]
06:59:56 00:02 [jvm-platform-validate]
Invalidated 1 target.
06:59:56 00:02 [resolve]
06:59:56 00:02 [ivy]
06:59:56 00:02 [cache].
Invalidated 1 target.
06:59:56 00:02 [ivy-resolve]
06:59:56 00:02 [coursier]
06:59:56 00:02 [go]
06:59:56 00:02 [scala-js-compile]
06:59:56 00:02 [scala-js-link]
06:59:57 00:03 [node]
06:59:57 00:03 [resources]
06:59:57 00:03 [prepare]
06:59:57 00:03 [services]
06:59:57 00:03 [compile]
06:59:57 00:03 [node]
06:59:57 00:03 [compile-jvm-prep-command]
06:59:57 00:03 [jvm_prep_command]
06:59:57 00:03 [compile-prep-command]
06:59:57 00:03 [compile]
06:59:57 00:03 [rsc]
06:59:57 00:03 [zinc]
Invalidated 1 target.
06:59:57 00:03 [isolation-zinc-pool-bootstrap]
[1/1] Compiling 1 zinc source in 1 target (testprojects/src/scala/org/pantsbuild/testproject/publish/classifiers:classifiers).
06:59:57 00:03 [compile]
06:59:57 00:03 [cache].
06:59:59 00:05 [cache].
Using cached artifacts for 1 target.
07:00:00 00:06 [cache].
Using cached artifacts for 1 target.
07:00:00 00:06 [cache].
07:00:00 00:06 [zinc]
[info] Compiling 1 Scala source to /home/workspae/pants_local/.pants.d/compile/zinc/a53b6931478d/testprojects.src.scala.org.pantsbuild.testproject.publish.classifiers.classifiers/current/classes ...
[info] Done compiling.
[info] Compile success at Nov 17, 2018 7:00:01 AM [0.847s]
07:00:01 00:07 [javac]
07:00:01 00:07 [cpp]
07:00:01 00:07 [errorprone]
07:00:01 00:07 [findbugs]
07:00:01 00:07 [go]
07:00:01 00:07 [doc]
07:00:01 00:07 [javadoc]
07:00:01 00:07 [scaladoc]
Invalidated 1 target.model contains 7 documentable templates
timestamp: 2018-11-17T07:00:03.837912
Signal 15 was raised. Exiting with failure. (backtrace omitted)
Waiting for background workers to finish.
07:00:03 00:09 [complete]
FAILURE
07:00:04 00:10 [jar]
07:00:04 00:10 [create]
Invalidated 1 target....
07:00:04 00:10 [publish]
07:00:04 00:10 [jar]
07:00:04 00:10 [ivy-publish]
Waiting for background workers to finish.
07:00:04 00:10 [complete]
SUCCESS
timestamp: 2018-11-17T07:00:04.994876
Exception caught: (sqlite3.IntegrityError)
File "/home/workspae/pants_local/src/python/pants/bin/pants_loader.py", line 75, in <module>
main()
File "/home/workspae/pants_local/src/python/pants/bin/pants_loader.py", line 71, in main
PantsLoader.run()
File "/home/workspae/pants_local/src/python/pants/bin/pants_loader.py", line 67, in run
cls.load_and_execute(entrypoint)
File "/home/workspae/pants_local/src/python/pants/bin/pants_loader.py", line 60, in load_and_execute
entrypoint_main()
File "/home/workspae/pants_local/src/python/pants/bin/pants_exe.py", line 39, in main
PantsRunner(exiter, start_time=start_time).run()
File "/home/workspae/pants_local/src/python/pants/bin/pants_runner.py", line 62, in run
return runner.run()
File "/home/workspae/pants_local/src/python/pants/bin/local_pants_runner.py", line 159, in run
self._run()
File "/home/workspae/pants_local/src/python/pants/bin/local_pants_runner.py", line 233, in _run
run_tracker_result = run_tracker.end()
File "/home/workspae/pants_local/src/python/pants/goal/run_tracker.py", line 457, in end
self.store_stats()
File "/home/workspae/pants_local/src/python/pants/goal/run_tracker.py", line 400, in store_stats
StatsDBFactory.global_instance().get_db().insert_stats(stats)
File "/home/workspae/pants_local/src/python/pants/stats/statsdb.py", line 86, in insert_stats
ri['version'], ri['buildroot'], ri['outcome'], ri['cmd_line']])
Exception message: UNIQUE constraint failed: run_info.id
Converting to ThreadPool seems to do the trick.
diff --git a/src/python/pants/backend/jvm/tasks/jvmdoc_gen.py b/src/python/pants/backend/jvm/tasks/jvmdoc_gen.py
index 1bb9458..76096b1 100644
--- a/src/python/pants/backend/jvm/tasks/jvmdoc_gen.py
+++ b/src/python/pants/backend/jvm/tasks/jvmdoc_gen.py
@@ -9,6 +9,7 @@ import contextlib
import multiprocessing
import os
import re
+from multiprocessing.pool import ThreadPool
from pants.backend.jvm.tasks.jvm_task import JvmTask
from pants.base.exceptions import TaskError
@@ -157,8 +158,10 @@ class JvmdocGen(SkipAndTransitiveOptionsRegistrar, HasSkipAndTransitiveOptionsMi
jobs[gendir] = (target, command)
if jobs:
+ # Use ThreadPool as there may be dangling processes that cause identical run id and
+ # then buildstats error downstream. https://github.com/pantsbuild/pants/issues/6785
with contextlib.closing(
- multiprocessing.Pool(processes=min(len(jobs), multiprocessing.cpu_count()))) as pool:
+ ThreadPool(processes=min(len(jobs), multiprocessing.cpu_count()))) as pool:
# map would be a preferable api here but fails after the 1st batch with an internal:
# ...
# File "...src/python/pants/backend/jvm/tasks/jar_create.py", line 170, in javadocjar
The CI for #6748 inconsistently fails at
tests/python/pants_test/backend/jvm/tasks:jar_publish_integration
with run id conflicts for buildstats.source: https://api.travis-ci.org/v3/job/455670120/log.txt
Previous conversation: