twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

VersionBatchStore calls mkdirs indirectly on read #720

Open oscar-stripe opened 7 years ago

oscar-stripe commented 7 years ago

We hit an issue where we can't read data without permissions to write.

https://github.com/twitter/summingbird/blob/develop/summingbird-scalding/src/main/scala/com/twitter/summingbird/scalding/store/VersionedBatchStore.scala#L86

this calls: https://github.com/twitter/summingbird/blob/develop/summingbird-batch-hadoop/src/main/scala/com/twitter/summingbird/batch/store/HDFSMetadata.scala#L83

which calls: https://github.com/twitter/scalding/blob/develop/scalding-commons/src/main/java/com/twitter/scalding/commons/datastores/VersionedStore.java#L32

which means you have to be able to call mkdirs to even read the data. There is a second constructor: https://github.com/twitter/scalding/blob/develop/scalding-commons/src/main/java/com/twitter/scalding/commons/datastores/VersionedStore.java#L35

which does not do this, which we should use.