Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.8k
stars
284
forks
source link
Adding the user parameter when pyarrow.hdfs.connect and using spark user when possible #386
User is added as optional parameters for filesystem resolver.
Various entry points for file system resolver (materialize dataset, row group indexer, etc) provide the spark user name to filesystem resolver. Usually the spark user name is gotten from HADOOP_USER_NAME environment variable
User is added as optional parameters for filesystem resolver. Various entry points for file system resolver (materialize dataset, row group indexer, etc) provide the spark user name to filesystem resolver. Usually the spark user name is gotten from HADOOP_USER_NAME environment variable