we recently found some of our folder were accidently deleted, and on doing more research, I found TypedTSV overwrites the folder if it already exists. I used the following simple script to test it.
val inputPath = args("input")
val outputPath = args("output")
val output = TypedPipe.from(TextLine(inputPath))
.map(p => p + 10)
output.write(TypedTsv(outputPath))
I ran the above script with the arguments something like --hdfs --input in1 --output out1 where in1 and out1 are folders on hdfs
we recently found some of our folder were accidently deleted, and on doing more research, I found TypedTSV overwrites the folder if it already exists. I used the following simple script to test it.
I ran the above script with the arguments something like
--hdfs --input in1 --output out1
wherein1
andout1
are folders on hdfsI was doing more research on this, from the following link I found the default behavior should be
KEEP
. http://docs.cascading.org/cascading/1.2/userguide/html/ch03s03.htmlHere is the excerpt from the above link
I am little confused as I dont see
SinkMode
as a parameter toTypedTSV
constructor, so maybe I am mixing two unrelated things.But fundamentally things should not get deleted.
Will really appreciate if somebody can look at this and probably explain the above behavior.
Thanks