OutOfMemoryError: GC overhead limit exceeded

Hi, I'm trying to make a graph of a subset of the EMNIST dataset (10K instances) and I'm getting an OutOfMemoryError:

java.lang.OutOfMemoryError: GC overhead limit exceeded

I tried to make a MWE to reproduce this, you can find it here: https://github.com/fvictorio/spark-knn-graphs-outofmemory

I tried both locally and in google cloud (with four workers with 15GB of RAM each).

It's very likely that I'm doing something wrong, since I read in another issue that you tested the library with a dataset with millions of rows. But maybe the large amount of dimensions is causing trouble?

Thanks.

tdebatty / spark-knn-graphs

OutOfMemoryError: GC overhead limit exceeded #15