connectedComponents() raises lots of warnings that say "block locks were not released by TID = ..."

nchammas commented 7 years ago

Trying to run a simple connectedComponents() analysis on an example dataset, even the one from the quick start, yields a flurry of warnings (several dozen?) like this:

16/10/17 22:45:40 WARN Executor: 1 block locks were not released by TID = 358:
[rdd_95_5]
16/10/17 22:45:40 WARN Executor: 1 block locks were not released by TID = 353:
[rdd_95_0]
16/10/17 22:45:40 WARN Executor: 1 block locks were not released by TID = 359:
[rdd_95_6]
...

And this is for a graph with literally 3-4 vertices and edges.

Is this an issue? Would it cause performance issues at scale? (Here's a related question on Stack Overflow.)

I'm running Python 2.7, Spark 2.0.1, and GraphFrames 0.2.

jkbradley commented 7 years ago

I'm seeing this now too. It looks like it was a 2.0.0 change in Spark: [https://issues.apache.org/jira/browse/SPARK-12757]

I'm pinging Josh about it and will report back.

JoshRosen commented 7 years ago

I wonder whether there's a limit() or take() or something which is otherwise causing reads to not release their locks until the end of the task. That's the most likely explanation, but to know more I'd have to download this and try to reproduce the problem to see where the locks are being acquired.

nchammas commented 7 years ago

Here's a minimal repro, taken mostly as-is from the User Guide:

graphframes-test.py:

import pyspark
from graphframes.examples import Graphs

if __name__ == '__main__':
    spark = pyspark.sql.SparkSession.builder.getOrCreate()
    g = Graphs(spark).friends()
    result = g.connectedComponents()
    result.select("id", "component").orderBy("component").show()

$ PYSPARK_PYTHON=python2 spark-submit --packages graphframes:graphframes:0.2.0-spark2.0-s_2.11 graphframes-test.py 
[startup log messages snipped]
16/10/24 17:42:35 WARN Executor: 3 block locks were not released by TID = 1106:
[rdd_100_2, rdd_117_2, rdd_114_2]
16/10/24 17:42:35 WARN Executor: 3 block locks were not released by TID = 1108:
[rdd_114_4, rdd_100_4, rdd_117_4]
16/10/24 17:42:35 WARN Executor: 3 block locks were not released by TID = 1107:
[rdd_117_3, rdd_100_3, rdd_114_3]
[several dozen such lines snipped]
+---+---------+
| id|component|
+---+---------+
|  b|        1|
|  c|        1|
|  d|        1|
|  a|        1|
|  f|        1|
|  e|        1|
+---+---------+

Commenting out the last line with .show() doesn't eliminate these warnings.

jkbradley commented 7 years ago

I talked with Josh. Synopsis:

It's likely from GraphX grouping rows in each partition into a single "row" per partition.
When GraphX uses data from the partition, it knows that there is only 1 row, so it just looks at that first row. The RDD API thinks that GraphX does not finish iterating through the data, and that may cause the block lock warning.
It is likely not causing any bugs.

My only worry is about annoying log messages, or even log sizes blowing up for certain data sizes and operations. If we are able to migrate more implementations to DataFrames and avoid GraphX, that should solve this problem. Or we could spend the time to fix the issue within GraphX, but that's low priority for me right now.

JoshRosen commented 7 years ago

If my hunch is right, we should be able to fix this in GraphX by registering a TaskCompletionCallback to just perform an extra hasNext() call on the single-element iterator.

felixcheung commented 7 years ago

perhaps it's a different cause, I'm seeing the same messages with GBTClassifier too - was added to SPARK-12757

jackalsin commented 7 years ago

I came across the same thing.

scala : 2.11.8 Spark:2.0.2

data

command:

scala> import org.apache.spark.graphx.GraphLoader
import org.apache.spark.graphx.GraphLoader

scala> val graph = GraphLoader.edgeListFile(sc, "dis.txt")
graph: org.apache.spark.graphx.Graph[Int,Int] = org.apache.spark.graphx.impl.GraphImpl@5960f17

scala>     val ranks = graph.pageRank(0.0001).vertices
16/11/18 08:04:12 WARN Executor: 1 block locks were not released by TID = 8:
[rdd_8_0]
16/11/18 08:04:12 WARN Executor: 1 block locks were not released by TID = 9:
[rdd_8_1]
16/11/18 08:04:12 WARN Executor: 2 block locks were not released by TID = 10:
[rdd_29_0, rdd_25_0]
16/11/18 08:04:12 WARN Executor: 2 block locks were not released by TID = 11:
[rdd_29_1, rdd_25_1]
16/11/18 08:04:12 WARN Executor: 2 block locks were not released by TID = 13:
[rdd_17_1, rdd_2_1]
16/11/18 08:04:12 WARN Executor: 2 block locks were not released by TID = 12:
[rdd_2_0, rdd_17_0]
16/11/18 08:04:12 WARN Executor: 2 block locks were not released by TID = 17:
[rdd_29_1, rdd_46_1]
16/11/18 08:04:12 WARN Executor: 2 block locks were not released by TID = 16:
[rdd_29_0, rdd_46_0]
16/11/18 08:04:13 WARN Executor: 3 block locks were not released by TID = 24:
[rdd_60_0, rdd_63_0, rdd_46_0]
16/11/18 08:04:13 WARN Executor: 3 block locks were not released by TID = 25:
[rdd_63_1, rdd_60_1, rdd_46_1]
16/11/18 08:04:13 WARN Executor: 3 block locks were not released by TID = 32:
[rdd_77_0, rdd_63_0, rdd_80_0]
16/11/18 08:04:13 WARN Executor: 3 block locks were not released by TID = 33:
[rdd_63_1, rdd_77_1, rdd_80_1]
16/11/18 08:04:13 WARN Executor: 3 block locks were not released by TID = 41:
[rdd_97_1, rdd_94_1, rdd_80_1]
16/11/18 08:04:13 WARN Executor: 3 block locks were not released by TID = 40:
[rdd_97_0, rdd_94_0, rdd_80_0]
16/11/18 08:04:13 WARN Executor: 3 block locks were not released by TID = 49:
[rdd_97_1, rdd_111_1, rdd_114_1]
16/11/18 08:04:13 WARN Executor: 3 block locks were not released by TID = 48:
[rdd_111_0, rdd_97_0, rdd_114_0]
16/11/18 08:04:14 WARN Executor: 3 block locks were not released by TID = 56:
[rdd_131_0, rdd_114_0, rdd_128_0]
16/11/18 08:04:14 WARN Executor: 3 block locks were not released by TID = 57:
[rdd_131_1, rdd_128_1, rdd_114_1]
16/11/18 08:04:16 WARN Executor: 3 block locks were not released by TID = 64:
[rdd_145_0, rdd_131_0, rdd_148_0]
16/11/18 08:04:16 WARN Executor: 3 block locks were not released by TID = 65:
[rdd_131_1, rdd_145_1, rdd_148_1]
16/11/18 08:04:17 WARN Executor: 3 block locks were not released by TID = 73:   
[rdd_165_1, rdd_162_1, rdd_148_1]
16/11/18 08:04:17 WARN Executor: 3 block locks were not released by TID = 72:
[rdd_165_0, rdd_162_0, rdd_148_0]
16/11/18 08:04:18 WARN Executor: 3 block locks were not released by TID = 80:
[rdd_179_0, rdd_165_0, rdd_182_0]
16/11/18 08:04:18 WARN Executor: 3 block locks were not released by TID = 81:
[rdd_165_1, rdd_182_1, rdd_179_1]
ranks: org.apache.spark.graphx.VertexRDD[Double] = VertexRDDImpl[200] at RDD at VertexRDD.scala:57

I can reproduce it even reinstall the spark.

I am not sure if this helps

Feel free to contact me.

nchammas commented 7 years ago

I am not seeing this anymore in GraphFrames 0.3 with the new default implementation. If I force the old GraphX algorithm in GraphFrames 0.3 I see the the warnings again.

Can anyone else confirm?

felixcheung commented 7 years ago

I still see a lot of these messages when running GF on Spark 2.1 (but no, no such message with connectedComponent)

I think this is a Spark issue?

[rdd_98_0, rdd_84_0]
16/12/29 18:57:52 WARN Executor: 2 block locks were not released by TID = 57:
[rdd_98_1, rdd_84_1]
16/12/29 18:57:52 WARN Executor: 1 block locks were not released by TID = 61:
[rdd_104_1]
16/12/29 18:57:52 WARN Executor: 1 block locks were not released by TID = 60:
[rdd_104_0]
16/12/29 18:57:52 WARN Executor: 2 block locks were not released by TID = 62:
[rdd_98_0, rdd_116_0]
16/12/29 18:57:52 WARN Executor: 2 block locks were not released by TID = 63:
[rdd_98_1, rdd_116_1]

toddpi314 commented 7 years ago

Looks like spark; seeing this on 2.0 and 2.1 the last few weeks.

jkbradley commented 7 years ago

I just tested this with master, and it's been fixed by https://github.com/apache/spark/pull/16513

graphframes / graphframes

connectedComponents() raises lots of warnings that say "block locks were not released by TID = ..." #116