Closed jiajiayb closed 7 years ago
I doubt if this 'ValueError' is caused by the older version of apache beam that is in datalab. Any suggestion on how to check the apache_beam module in datalab? And also any guide on how to install the apache_beam update in datalab? I will really appreciate there is any comment on how to debug this error. Thanks!
When I run 'pip freeze' in datalab notebook, it indicates the google-cloud-dataflow version is 0.4.2, which I believe is an older version of apache beam can cause this 'ValueError'. Any suggestion on how to upgrade google-cloud-dataflow without interrupting other python package setting in datalab? Or please point out if you think my guess for this 'ValueError' is wrong. Thanks a lot!
I tried '!pip install google-cloud-dataflow --upgrade' and '!pip install --upgrade --force-reinstall \ https://storage.googleapis.com/cloud-ml/sdk/cloudml.latest.tar.gz' for current running VM, but the same error pops up when I run the same command. Any suggestion on how to resolve this? I am pretty new to datalab and apache beam, please bear with me if this is a very stupid question. Thanks a lot!
Hi @qimingj, I am sorry to bother you but I realized the machine learning tutorials are removed from most updated build. May I ask if an updated tutorial for machine learning will be online soon. Thanks a lot!
Sorry for the late notice! We removed the previous machine learning notebooks because new ones are coming with new features. The old notebooks no longer works with latest tensorflow version.
@qimingj Thank you so much for your response. I think the question I met for this thread was due to I used the older notebook on the new build. May I ask when could we expect to have the new release of machine learning tutorials. Thanks a lot!
It will be very soon. :)
Great! Thanks a lot! Have a good one :)
The new notebooks have been released now as part of the Datalab GA release.
@chmeyers Great! Thank you so much for letting me know! I will have a try. Have a nice weekend!
@chmeyers Hi, I am sorry to bother you. I was exploring the ML toolbox and Tensorflow folders in the most recent datalab GA release today. I realized there is a time series tutorial was implemented with tensorflow, but it is a little bit hard for me to apply that to my own problem. I just wonder if there is any possibility to have some tutorial documents that have tensorflow applied to census data or iris data by using the service (not only running in local) in the near future. Thanks a lot!
Census and Iris samples are under "samples/ML Toolbox" directory. They both use "structured data" solution package that is implemented using Tensorflow. See https://github.com/googledatalab/pydatalab/tree/master/solutionbox/structured_data.
@qimingj Thanks a lot for your response. I saw the census and iris examples under "samples/ML Toolbox" directory only have 'local end to end' document. I just wonder if I could find 'service end to end' tutorial for iris and census data. Thank you for your help :)
For census data we do have service ones. https://github.com/googledatalab/notebooks/tree/master/samples/ML%20Toolbox/Regression/Census. The service runs are split into 4 notebooks, one for each step. We don't have an Iris service notebook but it should not be difficult to figure it out based on local run notebook.
@qimingj great! I am trying the iris local one. One quick question, where could I find the document/ manual to describe the module for example I would like to explore what are parameters in 'mltoolbox.classification.dnn'. Thanks!
You probably want to check the docstring of the functions (preprocess, train, predict, batch_predict). Just type function name followed by two question marks (such as dnn.train??), and execute it. Datalab should show you the help on the right pane, where you find all docstrings.
We should add the documentation for these modules under http://googledatalab.github.io/pydatalab/. I'll work on preparing this.
@qimingj @yebrahim Great! Thank you so much for your response.
Updated. Please check out the new ML Toolbox section and let us know if any improvements are needed.
@yebrahim This is very helpful! Thanks a lot!
Thank you @yebrahim!
@yebrahim @qimingj Sorry to bother you for another question. I am planing to use convolution neural network for classification problem. I did not find cnn was used in iris example, or described in mltoolbox.classification.dnn. Could you please guide me if there is a tutorial on using cnn for classification problem on dataset such as iris in datalab environment.
I am not sure cnn is useful in iris example. The structured data solution provided in Datalab does not include convolutional network, but you can build one with Tensorflow. Check out Tensorflow example:
Hello! When I trying the 'Evaluation and Batch Prediction' notebook in census tutorial, I met an error and have no idea why this is happening. So the script I am using is exactly the same as the one in the tutorial, which is as follows:
The error I get is as follow:
`ValueErrorTraceback (most recent call last) in () 36 # evaluation 37 ---> 38 eval_features = (pipeline | 'ReadEval' >> io.LoadFeatures('/content/datalab/tmp/ml/census/preprocessed/features_eval*')) 39 trained_model = pipeline | 'LoadModel' >> io.LoadModel('/content/datalab/tmp/ml/census/model/model') 40 evaluations = (eval_features | 'Evaluate' >> ml.Evaluate(trained_model) |
/usr/local/lib/python2.7/dist-packages/apache_beam/transforms/ptransform.pyc in ror(self, pvalueish) 727 728 def ror(self, pvalueish): --> 729 return self.transform.ror(pvalueish, self.label) 730 731 def apply(self, pvalue):
/usr/local/lib/python2.7/dist-packages/apache_beam/transforms/ptransform.pyc in ror(self, left, label) 435 pvalueish = _SetInputPValues().visit(pvalueish, replacements) 436 self.pipeline = p --> 437 result = p.apply(self, pvalueish, label) 438 if deferred: 439 return result
/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.pyc in apply(self, transform, pvalueish, label) 207 try: 208 old_label, transform.label = transform.label, label --> 209 return self.apply(transform, pvalueish) 210 finally: 211 transform.label = old_label
/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.pyc in apply(self, transform, pvalueish, label) 243 transform.type_check_inputs(pvalueish) 244 --> 245 pvalueish_result = self.runner.apply(transform, pvalueish) 246 247 if type_options is not None and type_options.pipeline_type_check:
/usr/local/lib/python2.7/dist-packages/apachebeam/runners/runner.pyc in apply(self, transform, input) 145 m = getattr(self, 'apply%s' % cls.name, None) 146 if m: --> 147 return m(transform, input) 148 raise NotImplementedError( 149 'Execution of [%s] not implemented in runner %s.' % (transform, self))
/usr/local/lib/python2.7/dist-packages/apache_beam/runners/runner.pyc in apply_PTransform(self, transform, input) 151 def apply_PTransform(self, transform, input): 152 # The base case of apply is to call the transform's apply. --> 153 return transform.apply(input) 154 155 def run_transform(self, transform_node):
/usr/local/lib/python2.7/dist-packages/google/cloud/ml/io/transforms.pyc in apply(self, pvalue) 148 file_pattern=self._file_pattern, 149 coder=mlcoders.ExampleProtoCoder(), --> 150 compression_type=self._compression_type)) 151 152
/usr/local/lib/python2.7/dist-packages/apache_beam/transforms/ptransform.pyc in ror(self, left, label) 435 pvalueish = _SetInputPValues().visit(pvalueish, replacements) 436 self.pipeline = p --> 437 result = p.apply(self, pvalueish, label) 438 if deferred: 439 return result
/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.pyc in apply(self, transform, pvalueish, label) 243 transform.type_check_inputs(pvalueish) 244 --> 245 pvalueish_result = self.runner.apply(transform, pvalueish) 246 247 if type_options is not None and type_options.pipeline_type_check:
/usr/local/lib/python2.7/dist-packages/apachebeam/runners/runner.pyc in apply(self, transform, input) 145 m = getattr(self, 'apply%s' % cls.name, None) 146 if m: --> 147 return m(transform, input) 148 raise NotImplementedError( 149 'Execution of [%s] not implemented in runner %s.' % (transform, self))
/usr/local/lib/python2.7/dist-packages/apache_beam/runners/runner.pyc in apply_PTransform(self, transform, input) 151 def apply_PTransform(self, transform, input): 152 # The base case of apply is to call the transform's apply. --> 153 return transform.apply(input) 154 155 def run_transform(self, transform_node):
/usr/local/lib/python2.7/dist-packages/google/cloud/ml/dataflow/io/tfrecordio.pyc in apply(self, pvalue) 164 165 def apply(self, pvalue): --> 166 return pvalue.pipeline | beam.Read(_TFRecordSource(*self._args)) 167 168
/usr/local/lib/python2.7/dist-packages/google/cloud/ml/dataflow/io/tfrecordio.pyc in init(self, file_pattern, coder, compression_type) 115 file_pattern=file_pattern, 116 compression_type=compression_type, --> 117 splittable=False) 118 self._coder = coder 119
/usr/local/lib/python2.7/dist-packages/apache_beam/io/filebasedsource.pyc in init(self, file_pattern, min_bundle_size, compression_type, splittable) 74 75 if compression_type == fileio.CompressionTypes.AUTO: ---> 76 raise ValueError('FileBasedSource currently does not support ' 77 'CompressionTypes.AUTO. Please explicitly specify the ' 78 'compression type or use '
ValueError: FileBasedSource currently does not support CompressionTypes.AUTO. Please explicitly specify the compression type or use CompressionTypes.UNCOMPRESSED if file is uncompressed.`
Could you please help me have a look what might be the issue for generating the error message. Thanks a lot!