Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Yesterday there was a new pandas package released. When using make_batch_reader and DataLoader for pytorch, we are getting this error:
File "/usr/local/lib/python3.6/dist-packages/horovod/spark/torch/remote.py", line 274, in _train
row = next(train_loader_iter)
File "/usr/local/lib/python3.6/dist-packages/petastorm/pytorch.py", line 152, in __iter__
for row in self.reader:
File "/usr/local/lib/python3.6/dist-packages/petastorm/reader.py", line 610, in __next__
return self._results_queue_reader.read_next(self._workers_pool, self.schema, self.ngram)
File "/usr/local/lib/python3.6/dist-packages/petastorm/arrow_reader_worker.py", line 60, in read_next
column_as_numpy = column_as_pandas.as_matrix()
File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 5273, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'as_matrix'
The root cause is that new pandas.Series does not have as_matrix
Yesterday there was a new pandas package released. When using
make_batch_reader
andDataLoader
for pytorch, we are getting this error:The root cause is that new
pandas.Series
does not haveas_matrix