Open oliverhu opened 3 years ago
@oliverhu any update on this?
no update recently @kvignesh1420
@oliverhu can we document the current feature in the form of a tutorial?
sure, will add that !
Reference FYKI: https://github.com/tensorflow/io/tree/master/docs/tutorials
Is HDFS supported now? Loading from HDFS path results in coredump
dataset = tfio.IODataset.from_orc("hdfs://xxx/yy/iris.orc", capacity=15).batch(1)
Is HDFS supported now? Loading from HDFS path results in coredump
dataset = tfio.IODataset.from_orc("hdfs://xxx/yy/iris.orc", capacity=15).batch(1)
HDFS supported (with kerberos) by https://github.com/tensorflow/io/pull/1674
(Creating this issue for visibility so people interested can join the discussion... )
Overview
Load Apache ORC formatted data natively into TensorFlow from file system supported by TensorFlow, e.g. HDFS, local disk, etc.
Motivation
We traditionally use Avro to store our dataset but it is becoming inefficient to use row based format for big data analytics processing. Historically we selected ORC as our columnar storage format. (not planning to argue Parquet vs ORC here ;))
Design Discussions
Milestones
parse_example_v2
.)