intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
https://analytics-zoo.readthedocs.io/
Apache License 2.0
11 stars 3 forks source link

absolute path for Zoo model and log path in Databricks #91

Open jack1981 opened 2 years ago

jack1981 commented 2 years ago

When we use Estimator.from_keras and save_keras_model API , the save_to_remote_dirhave to use absolute dbfs path ( with dbfs://) , but the log_dir we have to use non-absolute path ( couldn't include dbfs://) est = Estimator.from_keras(tm117_model.tm117_4_tfoptimizer(), model_dir=log_dir) est.save_keras_model(save_to_remote_dir)

Could Zoo support consistent path strategy for both model and logs ( checking point) ?

jenniew commented 2 years ago

On HDFS, no such issue. For log_dir, I think that BigDL RecordWriter only check file path prefix of "hdfs", and if not "hdfs", it uses Java FileOutputStream, which cannot support "dbfs://", to write.

jason-dai commented 2 years ago

On HDFS, no such issue. For log_dir, I think that BigDL RecordWriter only check file path prefix of "hdfs", and if not "hdfs", it uses Java FileOutputStream, which cannot support "dbfs://", to write.

Why do model and log have different behavior?

jenniew commented 2 years ago

save model API use hadoop fs to copy file, But for tensorboard logging, from comment of BigDL, FSDataOutputStream(hadoop FSDataOutputStream) couldn't flush data to localFileSystem in time. So reading summaries will throw exception.