sevamoo / SOMPY

A Python Library for Self Organizing Map (SOM)
Apache License 2.0
536 stars 242 forks source link

Using too much RAM on large dataset. #82

Open vishvanath45 opened 6 years ago

vishvanath45 commented 6 years ago

I am running document clustering using Sompy, I was following the example given along with this project. I had lists of documents. Each element in list contains text contained in respective document. So I followed following steps -

When I run the following command som = sompy.SOMFactory.build(document_list, mapsize, mask=None, mapshape='planar', lattice='rect', normalization='var', initialization='pca', neighborhood='gaussian', training='batch', name='sompy')

mapsize is 20x20 size of document_list is 92520x92520 I read online and people suggested using batch training and reducing the features using pca, I have done that, but still I find my RAM getting 100% utilised, (I have 126 GB RAM, 12 Core processor) and have to interrupt the program.

Any help at this time will be appreciated.

jazoza commented 6 years ago

Hi @vishvanath45 , I am working on document and term clustering with SOM, and I was actually using word2vec to vectorize the documents. Would you care to share your code? I could have a look at what you are describing here. Best Regards, Selena