tensorflow / ecosystem

Integration of TensorFlow with other open-source frameworks
Apache License 2.0
1.37k stars 392 forks source link

[spark-tensorflow-connector] Cannot read multiple TFRecord files #129

Open prasannaVijay opened 5 years ago

prasannaVijay commented 5 years ago

Using spark.read.format("tfrecord").load("path/to/one-file.tfrecord"), works. How do I read multiple directories with tfrecords in each? I have tried: spark.read.format("tfrecord").load(paths: _*), where paths is an array of paths. spark.read.format("tfrecord").load(path), where path is a regex of tfrecords paths. I have also tried using path as an option: spark.read.format("tfrecord").option("path", path).load() None of it works. Is there a recommended way to do this?

manuzhang commented 5 years ago

The format is tfrecords and both spark.read.format("tfrecords").load("path/to/*file.tfrecord") and spark.read.format("tfrecords").load("path/to/one-file.tfrecord,path/to/another-file.tfrecord") work for me

liusulizzu commented 2 years ago

i find the reason, the directory do not look up recursive

Using spark.read.format("tfrecord").load("path/to/one-file.tfrecord"), works. How do I read multiple directories with tfrecords in each? I have tried: spark.read.format("tfrecord").load(paths: _*), where paths is an array of paths. spark.read.format("tfrecord").load(path), where path is a regex of tfrecords paths. I have also tried using path as an option: spark.read.format("tfrecord").option("path", path).load() None of it works. Is there a recommended way to do this?