Using too much RAM on large dataset.

I am running document clustering using Sompy, I was following the example given along with this project. I had lists of documents. Each element in list contains text contained in respective document. So I followed following steps -

Used TF-IDF to vectorize the document.
I got a sparse matrix.
Converted Sparse matrix to dense matrix and then to square matrix.

When I run the following command som = sompy.SOMFactory.build(document_list, mapsize, mask=None, mapshape='planar', lattice='rect', normalization='var', initialization='pca', neighborhood='gaussian', training='batch', name='sompy')

mapsize is 20x20 size of document_list is 92520x92520 I read online and people suggested using batch training and reducing the features using pca, I have done that, but still I find my RAM getting 100% utilised, (I have 126 GB RAM, 12 Core processor) and have to interrupt the program.

Any help at this time will be appreciated.

sevamoo / SOMPY

Using too much RAM on large dataset. #82