thinline72 / nsl-kdd

PySpark solution to the NSL-KDD dataset: https://www.unb.ca/cic/datasets/nsl.html
Apache License 2.0
117 stars 58 forks source link

Use of * in standardizer function results in error #1

Closed ruze00 closed 7 years ago

ruze00 commented 7 years ago

In the standardizer section, the following code results in a syntax error:

train_scaler = [binary_cols, list(map(standardizer, numeric_cols)), ['id', 'labels2_index', 'labels2', 'labels5_index', 'labels5']] test_scaler = [test_binary_cols, list(map(standardizer, numeric_cols)), ['id', 'labels2_index', 'labels2', 'labels5_index', 'labels5']]

It doesn't like the * syntax. Is that supposed to be there? I'm using jupyter/all-spark-notebook docker image.

Removing the *s results in a different error.

thinline72 commented 7 years ago

Hi @ruze00 ,

* is used for unpacking python lists https://docs.python.org/3/tutorial/controlflow.html#unpacking-argument-lists So, for example train_scaler = [*binary_cols, *list(map(standardizer, numeric_cols)), *['id', 'labels2_index', 'labels2', 'labels5_index', 'labels5']] just produces a flatten list of columns.

Do you use Python 3+? Looks like Python 2+ doesn't support such syntax. Notebook is written in Python 3.

ruze00 commented 7 years ago

@thinline72, thanks so much for your response. I didn't check the version, sorry. It must have been 2. I started again from scratch with Python 3 and no issues.