intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
https://analytics-zoo.readthedocs.io/
Apache License 2.0
18 stars 4 forks source link

Run the latest version 'NYC'_ taxi_ dataset.ipynb 'error #732

Closed 2017wxyzwxyz closed 4 years ago

2017wxyzwxyz commented 4 years ago

When I running the latest version 'NYC taxi dataset.ipynb',the following error occurred:

from zoo.automl.common.util import train_val_test_split train_df, val_df, test_df = train_val_test_split(df, val_ratio=0.1, test_ratio=0.1)

Prepending /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/bigdl/share/conf/spark-bigdl.conf to sys.path Adding /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/share/lib/analytics-zoo-bigdl_0.10.0-spark_2.4.3-0.8.1-jar-with-dependencies.jar to BIGDL_JARS Prepending /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/share/conf/spark-analytics-zoo.conf to sys.path

ImportError Traceback (most recent call last) in ----> 1 from zoo.automl.common.util import train_val_test_split 2 train_df, val_df, test_df = train_val_test_split(df, val_ratio=0.1, test_ratio=0.1)

ImportError: cannot import name 'train_val_test_split'

shanyu-sys commented 4 years ago

You should run the notebook of the same version as your analytics-zoo. If you installed analytics-zoo with pip install, currently the last version released is 0.8.1, so you should run the notebook on branch-0.8. If you want to try the latest version on master, since it hasn't been released yet, you need to build a whl based on master code and manually install the whl (You can follow the steps here).

2017wxyzwxyz commented 4 years ago

But run the old version 'NYC taxi dataset.ipynb' on master, the following error occurred too:

from zoo.automl.common.util import split_input_df train_df, val_df, test_df = split_input_df(df, val_split_ratio=0.1, test_split_ratio=0.1)

Prepending /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/bigdl/share/conf/spark-bigdl.conf to sys.path Adding /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/share/lib/analytics-zoo-bigdl_0.10.0-spark_2.4.3-0.8.1-jar-with-dependencies.jar to BIGDL_JARS Prepending /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/share/conf/spark-analytics-zoo.conf to sys.path

NameError Traceback (most recent call last)

in 1 from zoo.automl.common.util import split_input_df ----> 2 train_df, val_df, test_df = split_input_df(df, val_split_ratio=0.1, test_split_ratio=0.1) NameError: name 'df' is not defined Let me ask you two questions: 一:I do installed ’analytics-zoo‘ (It seems to have been downloaded on March 6th) with ‘pip install’,so I should run the notebook on branch-0.8,but where is the 'NYC_ taxi_ dataset.ipynb' on branch-0.8 ? 二:In the future, I would like to write code on your platform to complete the work, but I am not familiar with data research and python language. So, can I ask you if the development language of this large unified platform can only use Python instead of learning other languages like Scala?I usually work with C / C + +, MATLAB.
shane-huang commented 4 years ago

But run the old version 'NYC taxi dataset.ipynb' on master, the following error occurred too:

from zoo.automl.common.util import split_input_df train_df, val_df, test_df = split_input_df(df, val_split_ratio=0.1, test_split_ratio=0.1)

Prepending /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/bigdl/share/conf/spark-bigdl.conf to sys.path

Adding /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/share/lib/analytics-zoo-bigdl_0.10.0-spark_2.4.3-0.8.1-jar-with-dependencies.jar to BIGDL_JARS Prepending /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/share/conf/spark-analytics-zoo.conf to sys.path NameError Traceback (most recent call last) in 1 from zoo.automl.common.util import split_input_df ----> 2 train_df, val_df, test_df = split_input_df(df, val_split_ratio=0.1, test_split_ratio=0.1)

NameError: name 'df' is not defined

Let me ask you two questions: 一:I do installed ’analytics-zoo‘ (It seems to have been downloaded on March 6th) with ‘pip install’,so I should run the notebook on branch-0.8,but where is the 'NYC taxi dataset.ipynb' on branch-0.8 ?

二:In the future, I would like to write code on your platform to complete the work, but I am not familiar with data research and python language. So, can I ask you if the development language of this large unified platform can only use Python instead of learning other languages like Scala?I usually work with C / C + +, MATLAB.

As for the error, again, the notebook and the installed library should be consistent. If you installed the master, you need to use the notebook in master, If you install from "pip install", find out the version of zoo which you installed, and download the notebook from the same branch. e.g. if you installed 0.8.1, you can download the notebook from https://github.com/intel-analytics/analytics-zoo/tree/branch-0.8.

The notebook is still as the same place. Since you didn't say which one you're using now, I assume you're using the notebook in the automl folder. You can still find it in 0.8 branch at https://github.com/intel-analytics/analytics-zoo/blob/branch-0.8/apps/automl/nyc_taxi_dataset.ipynb

C/C++ or MATLAB is not supported. Most of the deep learning frameworks are using python interfaces so we support python. Some of the models have both scala and python interface.

shane-huang commented 4 years ago

To use master, you can also pip install our nighly build packages. And those libs will contain the latest updates of code and you can use together with master version of notebook. Refer to https://analytics-zoo.github.io/master/#PythonUserGuide/install/#install-the-latest-nightly-build-wheels-for-pip for details of how to install latest nightly built packages.

shanyu-sys commented 4 years ago

But run the old version 'NYC taxi dataset.ipynb' on master, the following error occurred too:

from zoo.automl.common.util import split_input_df train_df, val_df, test_df = split_input_df(df, val_split_ratio=0.1, test_split_ratio=0.1)

Prepending /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/bigdl/share/conf/spark-bigdl.conf to sys.path

Adding /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/share/lib/analytics-zoo-bigdl_0.10.0-spark_2.4.3-0.8.1-jar-with-dependencies.jar to BIGDL_JARS Prepending /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/share/conf/spark-analytics-zoo.conf to sys.path NameError Traceback (most recent call last) in 1 from zoo.automl.common.util import split_input_df ----> 2 train_df, val_df, test_df = split_input_df(df, val_split_ratio=0.1, test_split_ratio=0.1)

NameError: name 'df' is not defined

Have you downloaded the nyc_taxi dataset as said in "Run Jupyter" part in nyc taxi readme? Are there any other error messages before?

2017wxyzwxyz commented 4 years ago

Yes,but I downloaded the nyc_taxi dataset Directly through the web address, not by running script files,and I put the 'nyc_taxi.csv' file to current dir, code is changed as following:

try:

dataset_path = os.getenv("ANALYTICS_ZOO_HOME")+"/bin/data/NAB/nyc_taxi/nyc_taxi.csv"

#raw_df = pd.read_csv(dataset_path)
raw_df = pd.read_csv("nyc_taxi.csv")

except Exception as e: print("nyc_taxi.csv doesn't exist") print("you can run $ANALYTICS_ZOO_HOME/bin/data/NAB/nyc_taxi/get_nyc_taxi.sh to download nyc_taxi.csv")

There was no other error messages .

shanyu-sys commented 4 years ago

Yes,but I downloaded the nyc_taxi dataset Directly through the web address, not by running script files,and I put the 'nyc_taxi.csv' file to current dir, code is changed as following:

try:

dataset_path = os.getenv("ANALYTICS_ZOO_HOME")+"/bin/data/NAB/nyc_taxi/nyc_taxi.csv"

raw_df = pd.read_csv(dataset_path)

raw_df = pd.read_csv("nyc_taxi.csv") except Exception as e: print("nyc_taxi.csv doesn't exist") print("you can run $ANALYTICS_ZOO_HOME/bin/data/NAB/nyc_taxi/get_nyc_taxi.sh to download nyc_taxi.csv")

There was no other error messages .

This seems to be our master version. The error message said that "df" is not defined. Could you please check if you have executed the cell that assign value to "df" correctly before using "df"?

2017wxyzwxyz commented 4 years ago

I do installed ’analytics-zoo‘ with ‘pip install’,so I should run the notebook on branch-0.8,According to your instructions, download two files, 'https://github.com/intel-analytics/analytics-zoo/tree/branch-0.8.' and 'https://github.com/intel-analytics/analytics-zoo/blob/branch-0.8/apps/automl/nyc_taxi_dataset.ipynb' open the nyc_taxi_dataset.ipynb, but open them with the following same error:

Error loading notebook Unreadable Notebook: /mnt/f/zooAutoml/jupyterCode/22/nyc_taxi_dataset-7-21-2.ipynb NotJSONError('Notebook does not appear to be JSON: \'\n\n\n\n\n<!DOCTYPE html>\n<html lang="...',)

2017wxyzwxyz commented 4 years ago

This seems to be our master version. The error message said that "df" is not defined. Could you please check if you have executed the cell that assign value to "df" correctly before using "df"?

This is the old Version‘s run result.

Today I open the new nyc_taxi_dataset.ipynb,it's code contain sentence 'df = pd.DataFrame(pd.to_datetime(raw_df.timestamp))' ,df.head() print is:

0 2014-07-01 00:00:00 10844
2014-07-01 00:30:00 8127
2014-07-01 01:00:00 6210
2014-07-01 01:30:00 4656
2014-07-01 02:00:00 3820

the next sentence

'from zoo.automl.common.util import train_val_test_split train_df, val_df, test_df = train_val_test_split(df, val_ratio=0.1, test_ratio=0.1)'

run error as following:

Prepending /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/bigdl/share/conf/spark-bigdl.conf to sys.path Adding /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/share/lib/analytics-zoo-bigdl_0.10.0-spark_2.4.3-0.8.1-jar-with-dependencies.jar to BIGDL_JARS Prepending /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/share/conf/spark-analytics-zoo.conf to sys.path


ImportError Traceback (most recent call last)

in ----> 1 from zoo.automl.common.util import train_val_test_split 2 train_df, val_df, test_df = train_val_test_split(df, val_ratio=0.1, test_ratio=0.1) ImportError: cannot import name 'train_val_test_split'
shanyu-sys commented 4 years ago

Again, you should not run the notebook on analytics-zoo master with another version of analytics-zoo installed (e.g 0.8.1). If you do want to run the new notebook on master, you could try installing our nightly build packages as suggested before.

To use master, you can also pip install our nightly build packages. And those libs will contain the latest updates of code and you can use together with master version of notebook. Refer to https://analytics-zoo.github.io/master/#PythonUserGuide/install/#install-the-latest-nightly-build-wheels-for-pip for details of how to install latest nightly built packages.

As you can see, in version 0.8.1, the util function name is split_input_df. And on master, the function name has changed to train_val_test_split. Therefore you need to use the same version of zoo to run the corresponding notebook.

NameError Traceback (most recent call last) in 1 from zoo.automl.common.util import split_input_df ----> 2 train_df, val_df, test_df = split_input_df(df, val_split_ratio=0.1, test_split_ratio=0.1)

NameError: name 'df' is not defined

The error message above is from version 0.8.1. But when I asked whether you have downloaded the dataset correctly, you gave me the code from the new notebook as below.

Yes,but I downloaded the nyc_taxi dataset Directly through the web address, not by running script files,and I put the 'nyc_taxi.csv' file to current dir, code is changed as following:

try:

dataset_path = os.getenv("ANALYTICS_ZOO_HOME")+"/bin/data/NAB/nyc_taxi/nyc_taxi.csv"

raw_df = pd.read_csv(dataset_path)

raw_df = pd.read_csv("nyc_taxi.csv") except Exception as e: print("nyc_taxi.csv doesn't exist") print("you can run $ANALYTICS_ZOO_HOME/bin/data/NAB/nyc_taxi/get_nyc_taxi.sh to download nyc_taxi.csv")

There was no other error messages .

So I am a little confused while you told me that

This is the old Version‘s run result.

So have you downloaded the dataset correctly for version 0.8.1 notebook? Is there still NameError? Please make sure you are running the same version notebook as your analytics-zoo version.

shanyu-sys commented 4 years ago

I do installed ’analytics-zoo‘ with ‘pip install’,so I should run the notebook on branch-0.8,According to your instructions, download two files, 'https://github.com/intel-analytics/analytics-zoo/tree/branch-0.8.' and 'https://github.com/intel-analytics/analytics-zoo/blob/branch-0.8/apps/automl/nyc_taxi_dataset.ipynb' open the nyc_taxi_dataset.ipynb, but open them with the following same error:

Error loading notebook Unreadable Notebook: /mnt/f/zooAutoml/jupyterCode/22/nyc_taxi_dataset-7-21-2.ipynb NotJSONError('Notebook does not appear to be JSON: '\n\n\n\n\n\n<html lang="...',)

How did you download the notebook? You could download the notebook via wget https://raw.githubusercontent.com/intel-analytics/analytics-zoo/branch-0.8/apps/automl/nyc_taxi_dataset.ipynb

2017wxyzwxyz commented 4 years ago

Oh, I downloaded it with a browser, not with the WGet command.

But download ‘nyc_taxi_dataset.ipynb’according to the instructions, there is an error: (base) wxy@SC-202007131929:~$ wget https://raw.githubusercontent.com/intel-analytics/analytics-zoo/branch-0.8/apps/automl/nyc_taxi_dataset.ipynb --2020-07-21 13:50:49-- https://raw.githubusercontent.com/intel-analytics/analytics-zoo/branch-0.8/apps/automl/nyc_taxi_dataset.ipynb Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 0.0.0.0, :: Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|0.0.0.0|:443... failed: Connection refused. Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|::|:443... failed: Connection refused. (base) wxy@SC-202007131929:~$

shanyu-sys commented 4 years ago

You could download from browser, but you need to download the raw file. First go to our notebook on branch-0.8 Then click "Raw" Then save as file.

2017wxyzwxyz commented 4 years ago

title :analytics-zoo/apps/automl/nyc_taxi_dataset.ipynb

Sorry, something went wrong. Reload?

after click "Raw"

页面无法显示(ERR_NAME_NOT_RESOLVED) 按 F5 刷新网页,或尝试操作:打开浏览器医生检查代理服务器设置

2017wxyzwxyz commented 4 years ago

Please send the 'notebook on branch-0.8' to my email testmywebsite@163.com

2017wxyzwxyz commented 4 years ago

Please send the 'notebook on branch-0.8-nyc_taxi_dataset.ipynb' to my email testmywebsite@163.com

2017wxyzwxyz commented 4 years ago

I try use following address open the 'notebook on branch-0.8-nyc_taxi_dataset.ipynb' https://nbviewer.jupyter.org/github/intel-analytics/analytics-zoo/blob/branch-0.8/apps/automl/nyc_taxi_dataset.ipynb

The create an new *.ipynb file ,copy the open content to this new one a bit bybit,then run it ,the following error occurred:

........................ code ................................. from zoo import init_spark_on_local from zoo.ray import RayContext sc = init_spark_on_local(cores=4) ray_ctx = RayContext(sc=sc, object_store_memory="1g") ray_ctx.init()

........................ run error .................................

Current pyspark location is : /home/wxy/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/pyspark/init.py Start to getOrCreate SparkContext


Exception Traceback (most recent call last)

in 1 from zoo import init_spark_on_local 2 from zoo.ray import RayContext ----> 3 sc = init_spark_on_local(cores=4) 4 ray_ctx = RayContext(sc=sc, object_store_memory="1g") 5 ray_ctx.init() ~/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/common/nncontext.py in init_spark_on_local(cores, conf, python_location, spark_log_level, redirect_spark_log) 37 redirect_spark_log=redirect_spark_log) 38 return sparkrunner.init_spark_on_local(cores=cores, conf=conf, ---> 39 python_location=python_location) 40 41 ~/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/util/spark.py in init_spark_on_local(self, cores, conf, python_location) 140 master = "local[{}]".format(cores) 141 zoo_conf = init_spark_conf(conf).setMaster(master) --> 142 sc = init_nncontext(conf=zoo_conf, redirect_spark_log=self.redirect_spark_log) 143 sc.setLogLevel(self.spark_log_level) 144 print("Successfully got a SparkContext") ~/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/common/nncontext.py in init_nncontext(conf, redirect_spark_log) 116 sc = getOrCreateSparkContext(conf=None, appName=conf) 117 else: --> 118 sc = getOrCreateSparkContext(conf=conf) 119 check_version() 120 if redirect_spark_log: ~/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/zoo/common/nncontext.py in getOrCreateSparkContext(conf, appName) 136 if appName: 137 spark_conf.setAppName(appName) --> 138 return SparkContext.getOrCreate(spark_conf) 139 else: 140 return SparkContext.getOrCreate() ~/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/pyspark/context.py in getOrCreate(cls, conf) 365 with SparkContext._lock: 366 if SparkContext._active_spark_context is None: --> 367 SparkContext(conf=conf or SparkConf()) 368 return SparkContext._active_spark_context 369 ~/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/pyspark/context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls) 131 " note this option will be removed in Spark 3.0") 132 --> 133 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf) 134 try: 135 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer, ~/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/pyspark/context.py in _ensure_initialized(cls, instance, gateway, conf) 314 with SparkContext._lock: 315 if not SparkContext._gateway: --> 316 SparkContext._gateway = gateway or launch_gateway(conf) 317 SparkContext._jvm = SparkContext._gateway.jvm 318 ~/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/pyspark/java_gateway.py in launch_gateway(conf) 44 :return: a JVM gateway 45 """ ---> 46 return _launch_gateway(conf) 47 48 ~/anaconda3/envs/ZooAutoml/lib/python3.6/site-packages/pyspark/java_gateway.py in _launch_gateway(conf, insecure) 106 107 if not os.path.isfile(conn_info_file): --> 108 raise Exception("Java gateway process exited before sending its port number") 109 110 with open(conn_info_file, "rb") as info: Exception: Java gateway process exited before sending its port number
shanyu-sys commented 4 years ago

Could you please open another issue since there is a different error? One specific error within one issue will be more convenient for other users to refer.

2017wxyzwxyz commented 4 years ago

Certainly! I'm not familiar with GitHub usage.