openzipkin / zipkin-dependencies

Spark job that aggregates zipkin spans for use in the UI
Apache License 2.0
176 stars 81 forks source link

zipkin-dependencies (storage: ES) exception -> java.lang.OutOfMemoryError: Java heap space #143

Open ldcsaa opened 5 years ago

ldcsaa commented 5 years ago

From one day onwards, my zipkin-dependencies job (storage: ES) run fail, and output logs like these, and how to resolve it ? My zipkin-server version: 2.12.9. both zipkin-dependencies version 2.1.0 and 2.3.1 throw these exceptions.

(heap memory confing: -Xmx6g -Xms6g, I think it's enough)

exception.log

codefromthecrypt commented 5 years ago

someone with more spark experience could mention what is likely to be usable by spark for jobs and best ways to profile. for example data is copied a couple of times. without knowing the size of your data it is hard to tell. you can check the elasticsearch-hadoop forum for tips as this is a straightforward job using their library. I suspect you will get someone suggesting to not use single jvm when processing a lot of data. in this case it is probably wise to come prepared with how much data is in the daily index and which daily index still works. ex you can always reprocess days to find out which was the breaking point.

On Tue, Aug 6, 2019, 5:09 PM Bruce Liang notifications@github.com wrote:

From one day onwards, my zipkin-dependencies job (storage: ES) run fail, and output logs like these, and how to resolve it ?

(and my heap memory confing: -Xmx6g -Xms6g, I think it's enough)

exception.log https://github.com/openzipkin/zipkin-dependencies/files/3470765/exception.log

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin-dependencies/issues/143?email_source=notifications&email_token=AAAPVVYXS3ULHD4SJI27RC3QDEPSLA5CNFSM4IJTGENKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HDRTY6Q, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAPVVYDHZJO634NVRWOYS3QDEPSLANCNFSM4IJTGENA .

aaf1 commented 4 years ago

hello i have the same problem, my zipkin index size ~9-11 Gb. Must i set heap > then index size?

shakuzen commented 4 years ago

Must i set heap > then index size?

Not in my experience with the zipkin-dependencies job, but I'm not a Spark expert either.

jorgheymans commented 4 years ago

FWIW we're running zipkin-dependencies with jdk8 and default heap, biggest index size we've seen for now is 2.5GB and it passed fine. Perhaps the complexity / size of the trace or span data plays a role ?

jorgheymans commented 4 years ago

Coming back to this, we ingested about 8.5GB of span data for a day recently, and even with a heap of 12G i can not get this processed it always OOMs. Obviously (right?) the heap dump contains mostly the trace data so analysing it is pointless.

I started digging into the depths of Spark tuning and discovered there's a whole world of optimizations possible: https://spark.apache.org/docs/latest/tuning.html#determining-memory-consumption . I will try and get to the bottom of this, and see what options there are to make this go through.

codefromthecrypt commented 4 years ago

yeah I am surprised that it needs to buffer in memory.. doesnt sound very streaming to me..

I forget the status of the Kafka alternative. would be nice to have something that can work in standalone mode and do aggregation without buffering so much as only thing needed to buffer is trace by trace ideally

cc @jeqo

On Thu, May 21, 2020, 5:54 AM Jorg Heymans notifications@github.com wrote:

Coming back to this, we ingested about 8.5GB of span data for a day recently, and even with a heap of 12G i can not get this processed it always OOMs. Obviously the heap dump just contains mostly the trace data so analysing it is pointless.

I started digging into the depths of Spark tuning and discovered there's a whole world of optimizations possible: https://spark.apache.org/docs/latest/tuning.html#determining-memory-consumption . I will try and get to the bottom of this, and see what options there are to make this go through.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin-dependencies/issues/143#issuecomment-631747635, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPVVYE5EP74VOC7ZWD7M3RSRGPZANCNFSM4IJTGENA .

jorgheymans commented 4 years ago

spark-streaming is a different thing https://spark.apache.org/docs/latest/streaming-programming-guide.html , that is not what this job is doing (but maybe it should or could).

Hooking up jconsole shows that in order to analyze about 5.5Gb of trace data, you need up to 10Gb of memory:

zipkin-deps-default-5 5G

Toying around with the kryo serializer as recommended here did not improve things greatly:

zipkin-deps-kryo-5 5G

I am going to try this week with different index sizes and see if the 2x rule in terms of heap holds. We could then document it as a recommendation. Still, i can imagine that 10Gb of trace data is not all that big, many sites will have a lot more ...

codefromthecrypt commented 4 years ago

it is crazy to me so much memory is needed. it hints manually scrolling the data could be far better in case of no cluster.

On Tue, Jun 2, 2020, 2:31 AM Jorg Heymans notifications@github.com wrote:

spark-streaming is a different thing https://spark.apache.org/docs/latest/streaming-programming-guide.html , that is not what this job is doing (but maybe it should or could).

Hooking up jconsole shows that in order to analyze about 5.5Gb of trace data, you need up to 10Gb of memory:

[image: zipkin-deps-default-5 5G] https://user-images.githubusercontent.com/193792/83441073-3a3d8e00-a446-11ea-8d18-8a9531e97be6.png

Toying around with the kryo serializer as recommended here https://spark.apache.org/docs/latest/tuning.html#data-serialization did not improve things greatly:

[image: zipkin-deps-kryo-5 5G] https://user-images.githubusercontent.com/193792/83441221-74a72b00-a446-11ea-81a8-ce19b8b20110.png

I am going to try this week with different index sizes and see if the 2x rule in terms of heap holds.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin-dependencies/issues/143#issuecomment-637031335, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPVV7VXFR7EOZ3JTBKMX3RUPXW3ANCNFSM4IJTGENA .