uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

If there is no partition information, an error will be reported here #684

Closed blacksunshine closed 3 years ago

blacksunshine commented 3 years ago

https://github.com/uber/petastorm/blob/master/petastorm/py_dict_reader_worker.py#L177

        if partitions is not None:
            column_names = set(field.name for field in self._schema.fields.values()) - partitions.partition_names
        else:
            column_names = set(field.name for field in self._schema.fields.values())
selitvin commented 3 years ago

Can you please provide steps to reproduce the issue? Preferably a small snippet to generate a toy dataset that results in the failure and a code to reproduce the failure itself. Thank you.

blacksunshine commented 3 years ago

Hi,Selitvin

  thank you for your reply.
  My original data is not a directory, but a parquet file, it has no partitions.It will go wrong in this scenario.
from petastorm import make_reader
from petastorm.tf_utils import make_petastorm_dataset
import tensorflow as tf

print('1')
with make_reader('file:///my.zstd.parquet') as reader:
    dataset = make_petastorm_dataset(reader)
    iterator = dataset.make_one_shot_iterator()
    tensor = iterator.get_next()
    with tf.Session() as sess:
        sample = sess.run(tensor)
        print(sample.id)
selitvin commented 3 years ago

Got it. Thank you for your report. This should fix the issue: https://github.com/uber/petastorm/pull/687

You can try it with

pip3 install git+https://github.com/selitvin/petastorm@issue_684
blacksunshine commented 3 years ago

Hi,Selitvin Thank you so much. I will try it. I am a newcomer, I like open source, and I also like to write code. I hope that if there are problems later, I can also fix them so that I can help you solve some minor problems.

blacksunshine commented 3 years ago

https://github.com/uber/petastorm#how-to-contribute I saw this, and thank you very much for your reply. Start my petastorm journey.

blacksunshine commented 3 years ago

this issue was solved by @selitvin selitvin,i will close it.