rokroskar / imnet

imNet: a Sequence Network Construction Toolkit
17 stars 13 forks source link

Issue with generate_spark_graph() #5

Open eirikhoye opened 4 years ago

eirikhoye commented 4 years ago

Hi, I managed to get sparkhpc and imnet running on our institute HPC cluster, however, when I run the code to generate a distributed graph:

import findspark; findspark.init()
import sparkhpc
template_path = '/cluster/home/eirikhoy/sparkhpc/build/lib/sparkhpc/templates/sparkjob.slurm.template'
sj = sparkhpc.sparkjob.SLURMSparkJob(ncores=4, template=template_path)
from pyspark import SparkContext
sc = SparkContext(master=sj.master_url())
import imnet
import numpy as np
from scipy.sparse import csr_matrix 
import pyspark
strings = imnet.random_strings.generate_random_sequences(5000)

g_rdd = imnet.process_strings.generate_spark_graph(strings, sc, max_ld=2).cache()

I get the error:

UnboundLocalError                         Traceback (most recent call last)
<ipython-input-15-af167cc949f4> in <module>()
----> 1 g_rdd = imnet.process_strings.generate_spark_graph(strings, sc, max_ld=2).cache()

/cluster/home/eirikhoy/.conda/envs/imnet_v0.2/lib/python2.7/site-packages/imnet/process_strings.pyc in generate_spark_graph(strings, sc, mat, min_ld, max_ld)
    189         warn("Problem importing pyspark -- are you sure your SPARK_HOME is set?")
--> 191     sqc = SQLContext(sc)
    193     strings_b = sc.broadcast(strings)

UnboundLocalError: local variable 'SQLContext' referenced before assignment

Note, I tested it on a local VM and got the same error, so maybe the issue is not with incorrect dependencies?

Both SPARK_HOME and JAVA_HOME environment variable are assigned:

>>> os.environ['SPARK_HOME']
>>> os.environ['JAVA_HOME']

The rest of the code examples ran fine.

rokroskar commented 3 years ago

Hi @eirikhoye apologies, I missed this issue - have you managed to resolve it? Can you try running these lines one by one in your session to maybe pinpoint where the problem lies?