Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.76k
stars
281
forks
source link
AttributeError: 'bool' object has no attribute 'map' while using Predicate #789
I'm trying to split training set and test set in a 80:20 ratio using predicate. And I got the following error:
/home/xzk/.local/lib/python3.7/site-packages/petastorm/hdfs/namenode.py:270: FutureWarning: pyarrow.hdfs.connect is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.
return pyarrow.hdfs.connect(hostname, url.port or 8020, **kwargs)
Worker 3 terminated: unexpected exception:
Traceback (most recent call last):
File "/home/xzk/.local/lib/python3.7/site-packages/petastorm/workers_pool/thread_pool.py", line 62, in run
self._worker_impl.process(*args, **kargs)
File "/home/xzk/.local/lib/python3.7/site-packages/petastorm/arrow_reader_worker.py", line 150, in process
all_cols = self._load_rows_with_predicate(parquet_file, piece, worker_predicate, shuffle_row_drop_partition)
File "/home/xzk/.local/lib/python3.7/site-packages/petastorm/arrow_reader_worker.py", line 258, in _load_rows_with_predicate
erase_mask = match_predicate_mask.map(operator.not_)
AttributeError: 'bool' object has no attribute 'map'
Iteration on Petastorm DataLoader raise error: AttributeError("'bool' object has no attribute 'map'")
Is this a bug? Or I'm using predicate in a wrong way? Please help, thank you!
My code:
def train_model(num_epochs=100, batch_size=1000):
for epoch in range(num_epochs):
with DataLoader(
make_batch_reader(dataset_url, num_epochs=reader_epochs, schema_fields=None,
transform_spec=None, seed=1, shuffle_rows=False, shuffle_row_groups=False,
predicate=in_pseudorandom_split([0.8, 0.2], 0, "some_column_name")),
batch_size=150) as dataloader:
for raw in dataloader:
print(raw)
break
Hello team,
I'm trying to split training set and test set in a 80:20 ratio using
predicate
. And I got the following error:I notice that:
Where
do_include(...)
seems to returnbool
only.Is this a bug? Or I'm using predicate in a wrong way? Please help, thank you!
My code: