Back to you with some memory issues. My experience so far is that SocialSent runs into memory problem when you reach a threshold of more or less 7000 words to score. So I ran it on a distributed architecture (shartcnet) with 38000 words to score and ask for 16G memory, yet it very soon runs out of memory again:
...
Using Theano backend.
/opt/sharcnet/python/2.7.8/intel/lib/python2.7/site-packages/scipy/lib/_util.py:35: DeprecationWarning: Module scipy.linalg.blas.fblas is deprecated, use scipy.linalg.blas instead
DeprecationWarning)
Evaluating SentProp with 100 dimensional GloVe embeddings
Evaluating binary and continuous classification performance
LEXICON
SEEDS
EMBEDDINGS
EVAL_WORDS
Traceback (most recent call last):
File "concreteness.py", line 95, in
sym=True, arccos=True)
File "/home/genereum/socialsent-master/polarity_induction_methods.py", line 99, in random_walk
M = transition_matrix(embeddings, **kwargs)
File "/home/genereum/socialsent-master/graph_construction.py", line 62, in transition_matrix
return Dinv.dot(L).dot(Dinv)
MemoryError
--- SharcNET Job Epilogue ---
job id: 12138822
exit status: 1
cpu time: 313s / 12.0h (0 %)
elapsed time: 479s / 12.0h (1 %)
virtual memory: 11.9G / 16.0G (74 %)
Job returned with status 1.
WARNING: Job only used 1 % of its requested walltime.
WARNING: Job only used 0 % of its requested cpu time.
WARNING: Job only used 65 % of allocated cpu time.
WARNING: Job only used 74% of its requested memory.
...
A solution would be to run it 7000 words at time. But maybe you know a way to increase the memory use by the program?
Numpy can't natively handle or distribute large matrix computations that are needed. I think the solution is to write some cython/c code to handle the Dinv.dot(L).dot(Dinv) computation.
Hi Will,
Back to you with some memory issues. My experience so far is that SocialSent runs into memory problem when you reach a threshold of more or less 7000 words to score. So I ran it on a distributed architecture (shartcnet) with 38000 words to score and ask for 16G memory, yet it very soon runs out of memory again:
... Using Theano backend. /opt/sharcnet/python/2.7.8/intel/lib/python2.7/site-packages/scipy/lib/_util.py:35: DeprecationWarning: Module scipy.linalg.blas.fblas is deprecated, use scipy.linalg.blas instead DeprecationWarning) Evaluating SentProp with 100 dimensional GloVe embeddings Evaluating binary and continuous classification performance LEXICON SEEDS EMBEDDINGS EVAL_WORDS Traceback (most recent call last): File "concreteness.py", line 95, in
sym=True, arccos=True)
File "/home/genereum/socialsent-master/polarity_induction_methods.py", line 99, in random_walk
M = transition_matrix(embeddings, **kwargs)
File "/home/genereum/socialsent-master/graph_construction.py", line 62, in transition_matrix
return Dinv.dot(L).dot(Dinv)
MemoryError
--- SharcNET Job Epilogue ---
job id: 12138822
exit status: 1
cpu time: 313s / 12.0h (0 %)
elapsed time: 479s / 12.0h (1 %)
virtual memory: 11.9G / 16.0G (74 %)
Job returned with status 1. WARNING: Job only used 1 % of its requested walltime. WARNING: Job only used 0 % of its requested cpu time. WARNING: Job only used 65 % of allocated cpu time. WARNING: Job only used 74% of its requested memory. ...
A solution would be to run it 7000 words at time. But maybe you know a way to increase the memory use by the program?
Thanks, Michel