microsoft / SynapseML

Simple and Distributed Machine Learning
http://aka.ms/spark
MIT License
5.07k stars 831 forks source link

VowpalWabbit not found in the latest released version #797

Closed arijeetm1 closed 4 years ago

arijeetm1 commented 4 years ago

Describe the bug cannot find vowpalWabbit in the mmlspark 0.17 version. https://github.com/Azure/mmlspark/archive/bba5c10ff774a7541be4cde7438ba710bd51f5e6.zip

Although could find it in the master : https://github.com/arijeetm1/mmlspark/blob/master/src/main/python/mmlspark/vw/VowpalWabbitRegressor.py

To Reproduce

  1. pyspark --packages Azure:mmlspark:0.17
  2. from mmlspark.vw import VowpalWabbitFeaturizer, VowpalWabbitClassifier [following the samples: https://github.com/arijeetm1/mmlspark/blob/master/notebooks/samples/Classification%20-%20Adult%20Census%20with%20Vowpal%20Wabbit.ipynb

Expected behavior Should be able to use vowpalWabbit module.

Info (please complete the following information):

Stacktrace

mmlspark.dict


{'__name__': 'mmlspark',
'__doc__': '\nMicrosoftML is a library of Python classes to interface with the\nMicrosoft scala APIs to utilize Apache Spark to create distibuted\nmachine learning models.\n\nMicrosoftML simplifies training and scoring classifiers and\nregressors, as well as facilitating the creation of models using the\nCNTK library, images, and text.\n',
'__package__': 'mmlspark',
'__loader__': <zipimporter object "/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar">,
'__spec__': ModuleSpec(name='mmlspark', loader=<zipimporter object "/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar">, origin='/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/__init__.py', submodule_search_locations=['/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark']),
'__path__': ['/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark'],
'__builtins__': {'__name__': 'builtins',
'__doc__': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices.",
'__package__': '',
'__loader__': _frozen_importlib.BuiltinImporter,
'__spec__': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>),
'__build_class__': <function __build_class__>,
'__import__': <function __import__>,
'abs': <function abs(x, /)>,
'all': <function all(iterable, /)>,
'any': <function any(iterable, /)>,
'ascii': <function ascii(obj, /)>,
'bin': <function bin(number, /)>,
'callable': <function callable(obj, /)>,
'chr': <function chr(i, /)>,
'compile': <function compile(source, filename, mode, flags=0, dont_inherit=False, optimize=-1)>,
'delattr': <function delattr(obj, name, /)>,
'dir': <function dir>,
'divmod': <function divmod(x, y, /)>,
'eval': <function eval(source, globals=None, locals=None, /)>,
'exec': <function exec(source, globals=None, locals=None, /)>,
'format': <function format(value, format_spec='', /)>,
'getattr': <function getattr>,
'globals': <function globals()>,
'hasattr': <function hasattr(obj, name, /)>,
'hash': <function hash(obj, /)>,
'hex': <function hex(number, /)>,
'id': <function id(obj, /)>,
'input': <bound method Kernel.raw_input of <ipykernel.ipkernel.IPythonKernel object at 0x7f4d1e1c4978>>,
'isinstance': <function isinstance(obj, class_or_tuple, /)>,
'issubclass': <function issubclass(cls, class_or_tuple, /)>,
'iter': <function iter>,
'len': <function len(obj, /)>,
'locals': <function locals()>,
'max': <function max>,
'min': <function min>,
'next': <function next>,
'oct': <function oct(number, /)>,
'ord': <function ord(c, /)>,
'pow': <function pow(x, y, z=None, /)>,
'print': <function print>,
'repr': <function repr(obj, /)>,
'round': <function round>,
'setattr': <function setattr(obj, name, value, /)>,
'sorted': <function sorted(iterable, /, *, key=None, reverse=False)>,
'sum': <function sum(iterable, start=0, /)>,
'vars': <function vars>,
'None': None,
'Ellipsis': Ellipsis,
'NotImplemented': NotImplemented,
'False': False,
'True': True,
'bool': bool,
'memoryview': memoryview,
'bytearray': bytearray,
'bytes': bytes,
'classmethod': classmethod,
'complex': complex,
'dict': dict,
'enumerate': enumerate,
'filter': filter,
'float': float,
'frozenset': frozenset,
'property': property,
'int': int,
'list': list,
'map': map,
'object': object,
'range': range,
'reversed': reversed,
'set': set,
'slice': slice,
'staticmethod': staticmethod,
'str': str,
'super': super,
'tuple': tuple,
'type': type,
'zip': zip,
'__debug__': True,
'BaseException': BaseException,
'Exception': Exception,
'TypeError': TypeError,
'StopAsyncIteration': StopAsyncIteration,
'StopIteration': StopIteration,
'GeneratorExit': GeneratorExit,
'SystemExit': SystemExit,
'KeyboardInterrupt': KeyboardInterrupt,
'ImportError': ImportError,
'ModuleNotFoundError': ModuleNotFoundError,
'OSError': OSError,
'EnvironmentError': OSError,
'IOError': OSError,
'EOFError': EOFError,
'RuntimeError': RuntimeError,
'RecursionError': RecursionError,
'NotImplementedError': NotImplementedError,
'NameError': NameError,
'UnboundLocalError': UnboundLocalError,
'AttributeError': AttributeError,
'SyntaxError': SyntaxError,
'IndentationError': IndentationError,
'TabError': TabError,
'LookupError': LookupError,
'IndexError': IndexError,
'KeyError': KeyError,
'ValueError': ValueError,
'UnicodeError': UnicodeError,
'UnicodeEncodeError': UnicodeEncodeError,
'UnicodeDecodeError': UnicodeDecodeError,
'UnicodeTranslateError': UnicodeTranslateError,
'AssertionError': AssertionError,
'ArithmeticError': ArithmeticError,
'FloatingPointError': FloatingPointError,
'OverflowError': OverflowError,
'ZeroDivisionError': ZeroDivisionError,
'SystemError': SystemError,
'ReferenceError': ReferenceError,
'BufferError': BufferError,
'MemoryError': MemoryError,
'Warning': Warning,
'UserWarning': UserWarning,
'DeprecationWarning': DeprecationWarning,
'PendingDeprecationWarning': PendingDeprecationWarning,
'SyntaxWarning': SyntaxWarning,
'RuntimeWarning': RuntimeWarning,
'FutureWarning': FutureWarning,
'ImportWarning': ImportWarning,
'UnicodeWarning': UnicodeWarning,
'BytesWarning': BytesWarning,
'ResourceWarning': ResourceWarning,
'ConnectionError': ConnectionError,
'BlockingIOError': BlockingIOError,
'BrokenPipeError': BrokenPipeError,
'ChildProcessError': ChildProcessError,
'ConnectionAbortedError': ConnectionAbortedError,
'ConnectionRefusedError': ConnectionRefusedError,
'ConnectionResetError': ConnectionResetError,
'FileExistsError': FileExistsError,
'FileNotFoundError': FileNotFoundError,
'IsADirectoryError': IsADirectoryError,
'NotADirectoryError': NotADirectoryError,
'InterruptedError': InterruptedError,
'PermissionError': PermissionError,
'ProcessLookupError': ProcessLookupError,
'TimeoutError': TimeoutError,
'open': <function io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)>,
'copyright': Copyright (c) 2001-2019 Python Software Foundation.
All Rights Reserved.

Copyright (c) 2000 BeOpen.com. All Rights Reserved.

Copyright (c) 1995-2001 Corporation for National Research Initiatives. All Rights Reserved.

Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam. All Rights Reserved., 'credits': Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands for supporting Python development. See www.python.org for more information., 'license': Type license() to see the full license text, 'help': Type help() for interactive help, or help(object) for help about object., 'IPYTHON': True, 'display': <function IPython.core.display.display(*objs, include=None, exclude=None, metadata=None, transient=None, display_id=None, **kwargs)>, 'pybind11_internals_v3_gcc_libstdcpp_cxxabi1002': <capsule object NULL at 0x7f4cfa045060>, 'get_ipython': <bound method InteractiveShell.get_ipython of <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f4d1e1c4ef0>>}, 'file__': '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/init.py', 'cached': None, 'Utils': <module 'mmlspark.Utils' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/Utils.py'>, 'TypeConversionUtils': <module 'mmlspark.TypeConversionUtils' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/TypeConversionUtils.py'>, 'AddDocuments': mmlspark.AddDocuments.AddDocuments, 'sys': <module 'sys' (built-in)>, 'basestring': str, 'Param': pyspark.ml.param.Param, 'Params': pyspark.ml.param.Params, 'TypeConverters': pyspark.ml.param.TypeConverters, 'HasMaxIter': pyspark.ml.param.shared.HasMaxIter, 'HasRegParam': pyspark.ml.param.shared.HasRegParam, 'HasFeaturesCol': pyspark.ml.param.shared.HasFeaturesCol, 'HasLabelCol': pyspark.ml.param.shared.HasLabelCol, 'HasPredictionCol': pyspark.ml.param.shared.HasPredictionCol, 'HasProbabilityCol': pyspark.ml.param.shared.HasProbabilityCol, 'HasRawPredictionCol': pyspark.ml.param.shared.HasRawPredictionCol, 'HasInputCol': pyspark.ml.param.shared.HasInputCol, 'HasInputCols': pyspark.ml.param.shared.HasInputCols, 'HasOutputCol': pyspark.ml.param.shared.HasOutputCol, 'HasOutputCols': pyspark.ml.param.shared.HasOutputCols, 'HasNumFeatures': pyspark.ml.param.shared.HasNumFeatures, 'HasCheckpointInterval': pyspark.ml.param.shared.HasCheckpointInterval, 'HasSeed': pyspark.ml.param.shared.HasSeed, 'HasTol': pyspark.ml.param.shared.HasTol, 'HasStepSize': pyspark.ml.param.shared.HasStepSize, 'HasHandleInvalid': pyspark.ml.param.shared.HasHandleInvalid, 'HasElasticNetParam': pyspark.ml.param.shared.HasElasticNetParam, 'HasFitIntercept': pyspark.ml.param.shared.HasFitIntercept, 'HasStandardization': pyspark.ml.param.shared.HasStandardization, 'HasThresholds': pyspark.ml.param.shared.HasThresholds, 'HasThreshold': pyspark.ml.param.shared.HasThreshold, 'HasWeightCol': pyspark.ml.param.shared.HasWeightCol, 'HasSolver': pyspark.ml.param.shared.HasSolver, 'HasVarianceCol': pyspark.ml.param.shared.HasVarianceCol, 'HasAggregationDepth': pyspark.ml.param.shared.HasAggregationDepth, 'HasParallelism': pyspark.ml.param.shared.HasParallelism, 'HasCollectSubModels': pyspark.ml.param.shared.HasCollectSubModels, 'HasLoss': pyspark.ml.param.shared.HasLoss, 'DecisionTreeParams': pyspark.ml.param.shared.DecisionTreeParams, 'HasDistanceMeasure': pyspark.ml.param.shared.HasDistanceMeasure, 'keyword_only': <function pyspark.keyword_only(func)>, 'JavaMLReadable': pyspark.ml.util.JavaMLReadable, 'JavaMLWritable': pyspark.ml.util.JavaMLWritable, 'JavaTransformer': pyspark.ml.wrapper.JavaTransformer, 'JavaEstimator': pyspark.ml.wrapper.JavaEstimator, 'JavaModel': pyspark.ml.wrapper.JavaModel, 'inherit_doc': <function pyspark.ml.common.inherit_doc(cls)>, 'JavaMLReader': pyspark.ml.util.JavaMLReader, 'MLReadable': pyspark.ml.util.MLReadable, 'JavaParams': pyspark.ml.wrapper.JavaParams, 'SparkContext': pyspark.context.SparkContext, 'from_java': <function mmlspark.Utils.from_java(java_stage, stage_name)>, 'JavaMMLReadable': mmlspark.Utils.JavaMMLReadable, 'ComplexParamsMixin': mmlspark.Utils.ComplexParamsMixin, 'JavaMMLReader': mmlspark.Utils.JavaMMLReader, 'generateTypeConverter': <function mmlspark.TypeConversionUtils.generateTypeConverter(name, cache, typeConverter)>, 'complexTypeConverter': <function mmlspark.TypeConversionUtils.complexTypeConverter(name, value, cache)>, 'AnalyzeImage': mmlspark.AnalyzeImage.AnalyzeImage, 'AssembleFeatures': mmlspark.AssembleFeatures.AssembleFeatures, 'AssembleFeaturesModel': mmlspark.AssembleFeatures.AssembleFeaturesModel, 'AzureSearchWriter': <module 'mmlspark.AzureSearchWriter' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/AzureSearchWriter.py'>, 'pyspark': <module 'pyspark' from '/usr/lib/spark/python/pyspark/init.py'>, 'sql': <module 'pyspark.sql' from '/usr/lib/spark/python/pyspark/sql/init.py'>, 'DataFrame': pyspark.sql.dataframe.DataFrame, 'streamToAzureSearch': <function mmlspark.AzureSearchWriter.streamToAzureSearch(df, options={})>, 'writeToAzureSearch': <function mmlspark.AzureSearchWriter.writeToAzureSearch(df, options={})>, 'BinaryFileReader': <module 'mmlspark.BinaryFileReader' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/BinaryFileReader.py'>, 'DataType': pyspark.sql.types.DataType, 'NullType': pyspark.sql.types.NullType, 'StringType': pyspark.sql.types.StringType, 'BinaryType': pyspark.sql.types.BinaryType, 'BooleanType': pyspark.sql.types.BooleanType, 'DateType': pyspark.sql.types.DateType, 'TimestampType': pyspark.sql.types.TimestampType, 'DecimalType': pyspark.sql.types.DecimalType, 'DoubleType': pyspark.sql.types.DoubleType, 'FloatType': pyspark.sql.types.FloatType, 'ByteType': pyspark.sql.types.ByteType, 'IntegerType': pyspark.sql.types.IntegerType, 'LongType': pyspark.sql.types.LongType, 'ShortType': pyspark.sql.types.ShortType, 'ArrayType': pyspark.sql.types.ArrayType, 'MapType': pyspark.sql.types.MapType, 'StructField': pyspark.sql.types.StructField, 'StructType': pyspark.sql.types.StructType, 'BinaryFileFields': ['path', 'bytes'], 'BinaryFileSchema': StructType(List(StructField(path,StringType,true),StructField(bytes,BinaryType,true))), 'readBinaryFiles': <function mmlspark.BinaryFileReader.readBinaryFiles(self, path, recursive=False, sampleRatio=1.0, inspectZip=True, seed=0)>, 'streamBinaryFiles': <function mmlspark.BinaryFileReader.streamBinaryFiles(self, path, sampleRatio=1.0, inspectZip=True, seed=0)>, 'isBinaryFile': <function mmlspark.BinaryFileReader.isBinaryFile(df, column)>, 'BingImageReader': <module 'mmlspark.BingImageReader' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/BingImageReader.py'>, 'streamBingImages': <function mmlspark.BingImageReader.streamBingImages(self, searchTerms, key, url, batchSize=10, imgsPerBatch=10)>, '_BingImageSearch': <module 'mmlspark._BingImageSearch' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_BingImageSearch.py'>, 'Lambda': mmlspark.Lambda.Lambda, 'BingImageSearch': mmlspark.BingImageSearch.BingImageSearch, '_CNTKLearner': <module 'mmlspark._CNTKLearner' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_CNTKLearner.py'>, '_CNTKModel': <module 'mmlspark._CNTKModel' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_CNTKModel.py'>, 'CNTKModel': mmlspark.CNTKModel.CNTKModel, 'CNTKLearner': mmlspark.CNTKLearner.CNTKLearner, 'CNTKmod': mmlspark.CNTKModel.CNTKModel, 'Cacher': mmlspark.Cacher.Cacher, 'CheckpointData': mmlspark.CheckpointData.CheckpointData, 'ClassBalancer': mmlspark.ClassBalancer.ClassBalancer, 'ClassBalancerModel': mmlspark.ClassBalancer.ClassBalancerModel, 'CleanMissingData': mmlspark.CleanMissingData.CleanMissingData, 'CleanMissingDataModel': mmlspark.CleanMissingData.CleanMissingDataModel, 'ComputeModelStatistics': mmlspark.ComputeModelStatistics.ComputeModelStatistics, 'ComputePerInstanceStatistics': mmlspark.ComputePerInstanceStatistics.ComputePerInstanceStatistics, 'CustomInputParser': mmlspark.CustomInputParser.CustomInputParser, 'CustomOutputParser': mmlspark.CustomOutputParser.CustomOutputParser, 'DataConversion': mmlspark.DataConversion.DataConversion, 'DescribeImage': mmlspark.DescribeImage.DescribeImage, 'DetectFace': mmlspark.DetectFace.DetectFace, 'DropColumns': mmlspark.DropColumns.DropColumns, 'DynamicMiniBatchTransformer': mmlspark.DynamicMiniBatchTransformer.DynamicMiniBatchTransformer, 'EnsembleByKey': mmlspark.EnsembleByKey.EnsembleByKey, 'EntityDetector': mmlspark.EntityDetector.EntityDetector, 'Explode': mmlspark.Explode.Explode, 'FastVectorAssembler': mmlspark.FastVectorAssembler.FastVectorAssembler, 'Featurize': mmlspark.Featurize.Featurize, 'PipelineModel': mmlspark.MultiColumnAdapter.PipelineModel, '_FindBestModel': <module 'mmlspark._FindBestModel' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_FindBestModel.py'>, 'FindBestModel': mmlspark.FindBestModel.FindBestModel, 'SQLContext': pyspark.sql.context.SQLContext, 'BestModel': mmlspark.FindBestModel.BestModel, 'FindSimilarFace': mmlspark.FindSimilarFace.FindSimilarFace, 'FixedMiniBatchTransformer': mmlspark.FixedMiniBatchTransformer.FixedMiniBatchTransformer, 'FlattenBatch': mmlspark.FlattenBatch.FlattenBatch, 'FluentAPI': <module 'mmlspark.FluentAPI' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/FluentAPI.py'>, 'GenerateThumbnails': mmlspark.GenerateThumbnails.GenerateThumbnails, 'GroupFaces': mmlspark.GroupFaces.GroupFaces, 'HTTPTransformer': mmlspark.HTTPTransformer.HTTPTransformer, 'HyperparamBuilder': mmlspark.HyperparamBuilder.HyperparamBuilder, 'DiscreteHyperParam': mmlspark.HyperparamBuilder.DiscreteHyperParam, 'RangeHyperParam': mmlspark.HyperparamBuilder.RangeHyperParam, 'GridSpace': mmlspark.HyperparamBuilder.GridSpace, 'RandomSpace': mmlspark.HyperparamBuilder.RandomSpace, '_ImageTransformer': <module 'mmlspark._ImageTransformer' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_ImageTransformer.py'>, 'ImageTransformer': mmlspark.ImageTransformer.ImageTransformer, 'IOImplicits': <module 'mmlspark.IOImplicits' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/IOImplicits.py'>, 'ImageSchema': StructType(List(StructField(origin,StringType,true),StructField(height,IntegerType,true),StructField(width,IntegerType,true),StructField(nChannels,IntegerType,true),StructField(mode,IntegerType,true),StructField(data,BinaryType,true))), 'image_source': 'org.apache.spark.ml.source.image.PatchedImageFileFormat', 'image_sink': 'org.apache.spark.ml.source.image.PatchedImageFileFormat', 'binary_source': 'org.apache.spark.binary.BinaryFileFormat', 'binary_sink': 'org.apache.spark.binary.BinaryFileFormat', 'IdentifyFaces': mmlspark.IdentifyFaces.IdentifyFaces, '_ImageFeaturizer': <module 'mmlspark._ImageFeaturizer' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_ImageFeaturizer.py'>, 'ImageFeaturizer': mmlspark.ImageFeaturizer.ImageFeaturizer, 'SparkSession': pyspark.sql.session.SparkSession, 'ImageLIME': mmlspark.ImageLIME.ImageLIME, 'ImageSetAugmenter': mmlspark.ImageSetAugmenter.ImageSetAugmenter, 'Row': pyspark.sql.types.Row, 'np': <module 'numpy' from '/home/amitra/venv/lib/python3.6/site-packages/numpy/init.py'>, 'ImageFields': ['origin', 'height', 'width', 'nChannels', 'mode', 'data'], 'toNDArray': <function mmlspark.ImageTransformer.toNDArray(image)>, 'toImage': <function mmlspark.ImageTransformer.toImage(array, path='', mode=16)>, 'ImageUtils': <module 'mmlspark.ImageUtils' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/ImageUtils.py'>, 'isImage': <function mmlspark.ImageUtils.isImage(df, column)>, 'readFromPaths': <function mmlspark.ImageUtils.readFromPaths(df, pathCol, imageCol='image')>, 'readFromStrings': <function mmlspark.ImageUtils.readFromStrings(df, bytesCol, imageCol='image', dropPrefix=False)>, 'IndexToValue': mmlspark.IndexToValue.IndexToValue, 'JSONInputParser': mmlspark.JSONInputParser.JSONInputParser, '_JSONOutputParser': <module 'mmlspark._JSONOutputParser' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_JSONOutputParser.py'>, 'JSONOutputParser': mmlspark.JSONOutputParser.JSONOutputParser, 'json': <module 'json' from '/opt/conda/default/lib/python3.6/json/init__.py'>, 'KeyPhraseExtractor': mmlspark.KeyPhraseExtractor.KeyPhraseExtractor, 'LanguageDetector': mmlspark.LanguageDetector.LanguageDetector, '_LightGBMClassifier': <module 'mmlspark._LightGBMClassifier' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_LightGBMClassifier.py'>, 'LightGBMClassifier': mmlspark.LightGBMClassifier.LightGBMClassifier, 'LightGBMClassificationModel': mmlspark.LightGBMClassifier.LightGBMClassificationModel, '_LightGBMRegressor': <module 'mmlspark._LightGBMRegressor' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_LightGBMRegressor.py'>, 'LightGBMRegressor': mmlspark.LightGBMRegressor.LightGBMRegressor, 'LightGBMRegressionModel': mmlspark.LightGBMRegressor.LightGBMRegressionModel, 'LocalNER': mmlspark.LocalNER.LocalNER, 'ModelDownloader': mmlspark.ModelDownloader.ModelDownloader, 'DEFAULT_URL': 'https://mmlspark.azureedge.net/datasets/CNTKModels/', 'ModelSchema': mmlspark.ModelDownloader.ModelSchema, 'MultiColumnAdapter': mmlspark.MultiColumnAdapter.MultiColumnAdapter, 'MultiNGram': mmlspark.MultiNGram.MultiNGram, 'NER': mmlspark.NER.NER, 'OCR': mmlspark.OCR.OCR, 'PageSplitter': mmlspark.PageSplitter.PageSplitter, 'PartitionConsolidator': mmlspark.PartitionConsolidator.PartitionConsolidator, 'PartitionSample': mmlspark.PartitionSample.PartitionSample, 'PowerBIWriter': <module 'mmlspark.PowerBIWriter' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/PowerBIWriter.py'>, 'streamToPowerBI': <function mmlspark.PowerBIWriter.streamToPowerBI(df, url, options={})>, 'writeToPowerBI': <function mmlspark.PowerBIWriter.writeToPowerBI(df, url, options={})>, 'RankingAdapter': mmlspark.RankingAdapter.RankingAdapter, 'RankingAdapterModel': mmlspark.RankingAdapterModel.RankingAdapterModel, 'RankingEvaluator': mmlspark.RankingEvaluator.RankingEvaluator, 'JavaEvaluator': pyspark.ml.evaluation.JavaEvaluator, '_RankingTrainValidationSplitModel': <module 'mmlspark._RankingTrainValidationSplitModel' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_RankingTrainValidationSplitModel.py'>, 'RankingTrainValidationSplitModel': mmlspark.RankingTrainValidationSplitModel.RankingTrainValidationSplitModel, '_RankingTrainValidationSplit': <module 'mmlspark._RankingTrainValidationSplit' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_RankingTrainValidationSplit.py'>, 'RankingTrainValidationSplit': mmlspark.RankingTrainValidationSplit.RankingTrainValidationSplit, 'tvmodel': mmlspark.RankingTrainValidationSplitModel.RankingTrainValidationSplitModel, 'ValidatorParams': pyspark.ml.tuning.ValidatorParams, 'os': <module 'os' from '/opt/conda/default/lib/python3.6/os.py'>, 'time': <module 'time' (built-in)>, 'uuid': <module 'uuid' from '/opt/conda/default/lib/python3.6/uuid.py'>, 'warnings': <module 'warnings' from '/opt/conda/default/lib/python3.6/warnings.py'>, 'unicode': str, 'long': int, 'since': <function pyspark.since(version)>, 'VersionUtils': pyspark.util.VersionUtils, 'Identifiable': pyspark.ml.util.Identifiable, 'BaseReadWrite': pyspark.ml.util.BaseReadWrite, 'MLWriter': pyspark.ml.util.MLWriter, 'GeneralMLWriter': pyspark.ml.util.GeneralMLWriter, 'JavaMLWriter': pyspark.ml.util.JavaMLWriter, 'GeneralJavaMLWriter': pyspark.ml.util.GeneralJavaMLWriter, 'MLWritable': pyspark.ml.util.MLWritable, 'GeneralJavaMLWritable': pyspark.ml.util.GeneralJavaMLWritable, 'MLReader': pyspark.ml.util.MLReader, 'JavaPredictionModel': pyspark.ml.util.JavaPredictionModel, 'DefaultParamsWritable': pyspark.ml.util.DefaultParamsWritable, 'DefaultParamsWriter': pyspark.ml.util.DefaultParamsWriter, 'DefaultParamsReadable': pyspark.ml.util.DefaultParamsReadable, 'DefaultParamsReader': pyspark.ml.util.DefaultParamsReader, 'Estimator': pyspark.ml.base.Estimator, 'RecognizeDomainSpecificContent': mmlspark.RecognizeDomainSpecificContent.RecognizeDomainSpecificContent, 'RecognizeText': mmlspark.RecognizeText.RecognizeText, 'RecommendationIndexer': mmlspark.RecommendationIndexer.RecommendationIndexer, 'RecommendationIndexerModel': mmlspark.RecommendationIndexerModel.RecommendationIndexerModel, 'RenameColumn': mmlspark.RenameColumn.RenameColumn, 'Repartition': mmlspark.Repartition.Repartition, '_SAR': <module 'mmlspark._SAR' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_SAR.py'>, '_SARModel': <module 'mmlspark._SARModel' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_SARModel.py'>, 'SARModel': mmlspark.SARModel.SARModel, 'SAR': mmlspark.SAR.SAR, 'sar': mmlspark._SAR._SAR, 'sarm': mmlspark.SARModel.SARModel, 'sarModel': mmlspark._SARModel._SARModel, 'SelectColumns': mmlspark.SelectColumns.SelectColumns, 'ServingFunctions': <module 'mmlspark.ServingFunctions' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/ServingFunctions.py'>, 'Column': pyspark.sql.column.Column, 'string_to_response': <function mmlspark.ServingFunctions.string_to_response(c)>, 'request_to_string': <function mmlspark.ServingFunctions.request_to_string(c)>, 'ServingImplicits': <module 'mmlspark.ServingImplicits' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/ServingImplicits.py'>, 'serving_source': 'org.apache.spark.sql.execution.streaming.HTTPSourceProvider', 'serving_sink': 'org.apache.spark.sql.execution.streaming.HTTPSinkProvider', 'distributed_serving_source': 'org.apache.spark.sql.execution.streaming.DistributedHTTPSourceProvider', 'distributed_serving_sink': 'org.apache.spark.sql.execution.streaming.DistributedHTTPSinkProvider', 'continuous_serving_source': 'org.apache.spark.sql.execution.streaming.continuous.HTTPSourceProviderV2', 'continuous_serving_sink': 'org.apache.spark.sql.execution.streaming.continuous.HTTPSinkProviderV2', '_SimpleHTTPTransformer': <module 'mmlspark._SimpleHTTPTransformer' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_SimpleHTTPTransformer.py'>, 'SimpleHTTPTransformer': mmlspark.SimpleHTTPTransformer.SimpleHTTPTransformer, 'StringOutputParser': mmlspark.StringOutputParser.StringOutputParser, 'SummarizeData': mmlspark.SummarizeData.SummarizeData, 'SuperpixelTransformer': mmlspark.SuperpixelTransformer.SuperpixelTransformer, 'TabularLIME': mmlspark.TabularLIME.TabularLIME, 'TabularLIMEModel': mmlspark.TabularLIMEModel.TabularLIMEModel, 'TagImage': mmlspark.TagImage.TagImage, 'TextFeaturizer': mmlspark.TextFeaturizer.TextFeaturizer, 'TextFeaturizerModel': mmlspark.TextFeaturizer.TextFeaturizerModel, 'TextPreprocessor': mmlspark.TextPreprocessor.TextPreprocessor, 'TextSentiment': mmlspark.TextSentiment.TextSentiment, 'TimeIntervalMiniBatchTransformer': mmlspark.TimeIntervalMiniBatchTransformer.TimeIntervalMiniBatchTransformer, 'Timer': mmlspark.Timer.Timer, 'TimerModel': mmlspark.Timer.TimerModel, '_TrainClassifier': <module 'mmlspark._TrainClassifier' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_TrainClassifier.py'>, 'TrainClassifier': mmlspark.TrainClassifier.TrainClassifier, 'TrainedClassifierModel': mmlspark.TrainClassifier.TrainedClassifierModel, '_TrainRegressor': <module 'mmlspark._TrainRegressor' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_TrainRegressor.py'>, 'TrainRegressor': mmlspark.TrainRegressor.TrainRegressor, 'TrainedRegressorModel': mmlspark.TrainRegressor.TrainedRegressorModel, '_TuneHyperparameters': <module 'mmlspark._TuneHyperparameters' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_TuneHyperparameters.py'>, 'TuneHyperparameters': mmlspark.TuneHyperparameters.TuneHyperparameters, 'TuneHyperparametersModel': mmlspark.TuneHyperparameters.TuneHyperparametersModel, 'Py4JError': py4j.protocol.Py4JError, '_UDFTransformer': <module 'mmlspark._UDFTransformer' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/_UDFTransformer.py'>, 'UDFTransformer': mmlspark.UDFTransformer.UDFTransformer, 'UserDefinedFunction': pyspark.sql.udf.UserDefinedFunction, 'UnicodeNormalize': mmlspark.UnicodeNormalize.UnicodeNormalize, 'UnrollBinaryImage': mmlspark.UnrollBinaryImage.UnrollBinaryImage, 'UnrollImage': mmlspark.UnrollImage.UnrollImage, 'ValueIndexer': mmlspark.ValueIndexer.ValueIndexer, 'ValueIndexerModel': mmlspark.ValueIndexerModel.ValueIndexerModel, 'VerifyFaces': mmlspark.VerifyFaces.VerifyFaces, 'java_params_patch': <module 'mmlspark.java_params_patch' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/java_params_patch.py'>, 'plot': <module 'mmlspark.plot' from '/hadoop/spark/tmp/spark-ea2c2fdd-7700-4dff-af2d-4e8c1117e394/userFiles-58c05f2e-9219-4f42-a787-9d03e37e8d00/Azure_mmlspark-0.17.jar/mmlspark/plot.py'>, 'itertools': <module 'itertools' (built-in)>, 'confusionMatrix': <function mmlspark.plot.confusionMatrix(df, y_col, y_hat_col, labels)>, 'roc': <function mmlspark.plot.roc(df, y_col, y_hat_col, thresh=0.5)>} ​```

If the bug pertains to a specific feature please tag the appropriate CODEOWNER for better visibility

Additional context Add any other context about the problem here.

welcome[bot] commented 4 years ago

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

imatiach-msft commented 4 years ago

@arijeetm1 I don't believe it was added in 0.17 version. That version is actually quite old. You could try 0.18.1 version, the RC 1.0 build or use latest master. The snapshot is on the main page. Also maybe @mhamilton723 might do a release soon.

imatiach-msft commented 4 years ago

https://github.com/Azure/mmlspark/releases/tag/v0.18.0 see release notes, it was added in 0.18 version

arijeetm1 commented 4 years ago

gotcha..thanks @imatiach-msft @mhamilton723 any estimate on when would the release go through?

candalfigomoro commented 4 years ago

@imatiach-msft @mhamilton723 I'm seeing a lot of confusion because of the extra repository configuration. Would it be possible to publish the next version in the same way you published the 0.18.x versions (without the extra resolver)? On my side, because of security restrictions, I'm having a hard time getting the extra repository configured for my company's platform (as Internet access is severely restricted) and I'm still not able to try version 1.0 rc1.

arijeetm1 commented 4 years ago

@mhamilton723 just for clarification, do you publish the released packages here : https://spark-packages.org/package/Azure/mmlspark ? @candalfigomoro would be great if you could explain how you configured 0.18.x version.

candalfigomoro commented 4 years ago

@arijeetm1 https://search.maven.org/search?q=g:com.microsoft.ml.spark