twosigma / flint

A Time Series Library for Apache Spark
Apache License 2.0
995 stars 184 forks source link

Add python tests and travis integration #14

Closed mforsyth closed 7 years ago

mforsyth commented 7 years ago

Fixes #6

leifwalsh commented 7 years ago

I don't want these to depend on pytest, I want them to use unittest.

heimir-sverrisson commented 7 years ago

@leifwalsh I already implemented the unittest version but was directed by @icexelloss to use pytest instead. Could you two discuss and let me know what the direction is? I'm fine either way.

icexelloss commented 7 years ago

@heimir-sverrisson Sorry for get back to you late.

Per discussion with @leifwalsh, we are going with unittest instead of py.test.

Here is an example:


import pyspark
import pyspark.sql
def setUp(obj, options=None):
    '''Starts spark and sets attributes ``sc``, and ``sqlContext``.
    '''
    setattr(obj, '_env', dict(os.environ))
    setattr(obj, '_path', list(sys.path))
    options = collections.ChainMap(options, _DEFAULT_OPTIONS)
    sc = pyspark.SparkContext(pyspark.SparkConf(options))
    sqlContext = pyspark.sql.SQLContext(sc)
    setattr(obj, 'sc', sc)
    setattr(obj, 'sqlContext', sqlContext)
def tearDown(obj):
    '''Shuts down spark and removes attributes ``sc``, and ``sqlContext``.
    '''
    obj.sc.stop()
    import pyspark
    pyspark.SparkContext._gateway.shutdown()
    pyspark.SparkContext._gateway = None
    delattr(obj, 'sqlContext')
    delattr(obj, 'sc')
    os.environ = obj._env
    sys.path = obj._path
    delattr(obj, '_env')
    delattr(obj, '_path')
class SparkTestCase(unittest.TestCase):
    @classmethod
    def options(cls):
        return {}
    @classmethod
    def setUpClass(cls):
        setUp(cls, options=cls.options())
    @classmethod
    def tearDownClass(cls):
        tearDown(cls)
``

Flint test case should extend this class. (And we will have a different class for setting sc, sqlContext internally)
heimir-sverrisson commented 7 years ago

@icexelloss and @leifwalsh, thanks for the clarifications. I will start working on this.

heimir-sverrisson commented 7 years ago

With the last commit I've replaced the pytest code with a minimal unittest.

icexelloss commented 7 years ago

@heimir-sverrisson ,

Can you fix licences to be this?

#
#  Copyright 2017 TWO SIGMA OPEN SOURCE, LLC
#
#  Licensed under the Apache License, Version 2.0 (the "License");
#  you may not use this file except in compliance with the License.
#  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.
#
icexelloss commented 7 years ago

Please also add licences to new files

icexelloss commented 7 years ago

I have some minor comments but this looks good. @leifwalsh , do you have anything to add?

icexelloss commented 7 years ago

@heimir-sverrisson ,

I followed the readme to run the test but got an exception:

ljin@ire-ljin:~/ws/github/flint/python$ ls $SPARK_HOME
bin  conf  data  examples  jars  LICENSE  licenses  NOTICE  python  R  README.md  RELEASE  sbin  yarn
ljin@ire-ljin:~/ws/github/flint/python$ echo $SPARK_HOME
/home/ljin/ws/github/flint/spark-2.1.1-bin-hadoop2.7
ljin@ire-ljin:~/ws/github/flint/python$ python -W ignore -m unittest discover tests
E
======================================================================
ERROR: test_dataframe (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_dataframe
Traceback (most recent call last):
  File "/nas/dft/ire/ljin/.conda/envs/flint-dev/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/nas/dft/ire/ljin/.conda/envs/flint-dev/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/home/ljin/ws/github/flint/python/tests/test_dataframe.py", line 19, in <module>
    import ts.flint.dataframe
  File "/home/ljin/ws/flint/python/ts/flint/__init__.py", line 19, in <module>
    from .context import FlintContext
  File "/home/ljin/ws/flint/python/ts/flint/context.py", line 25, in <module>
    import py4j
ImportError: No module named 'py4j'

----------------------------------------------------------------------
Ran 1 test in 0.000s

Am I doing sth wrong?

heimir-sverrisson commented 7 years ago

@icexelloss I suspect that this may because the spark-defaults.conf is missing. See the sed step next to last in the python/travis/prepare_python_tests.sh for reference.

icexelloss commented 7 years ago

I tried to run the branch but hit a few issues:

heimir-sverrisson commented 7 years ago

@icexelloss the following issues have been addressed:

heimir-sverrisson commented 7 years ago

@icexelloss I've addressed all the issues you pointed out.

icexelloss commented 7 years ago

@heimir-sverrisson This looks good. A few minor comments and I think we can merge.

@leifwalsh , can you review/approve?

icexelloss commented 7 years ago

I will do a testing before merge to see if it went through.

heimir-sverrisson commented 7 years ago

@icexelloss suggested minor changes done.

icexelloss commented 7 years ago

@heimir-sverrisson

I am able to run python test locally with one change:

--- a/python/travis/prepare_python_tests.sh
+++ b/python/travis/prepare_python_tests.sh
@@ -28,8 +28,6 @@ export SPARK_HOME=$(pwd)/spark
 curl http://apache.cs.utah.edu/spark/spark-2.1.1/spark-2.1.1-bin-hadoop2.7.tgz -o spark.tgz
 tar -xzf spark.tgz
 mv spark-2.1.1-bin-hadoop2.7 spark
-# Create the Flint environment 
-conda create -n flint python=3.5 pandas notebook

 # Run the common part of the test setup
 cd ..
diff --git a/scripts/prepare_python_tests.sh b/scripts/prepare_python_tests.sh
index 7f97038..ec7170b 100755
--- a/scripts/prepare_python_tests.sh
+++ b/scripts/prepare_python_tests.sh
@@ -31,3 +31,5 @@ cp python/travis/spark_log4j.properties ${SPARK_HOME}/conf/log4j.properties
 # Add the flint assembly to spark test instance
 sed "s,_FLINT_ROOT_,$(pwd)," <python/travis/spark-defaults.conf >${SPARK_HOME}/conf/spark-defaults.conf

+# Create the Flint environment
+conda create -n flint python=3.5 --file python/requirements.txt

Can you update this.

icexelloss commented 7 years ago

Otherwise LGTM. @leifwalsh can you approve as well.

heimir-sverrisson commented 7 years ago

@icexelloss I've made the update to the scripts.

icexelloss commented 7 years ago

Merged. Thanks!