grailbio / reflow

A language and runtime for distributed, incremental data processing in the cloud
Apache License 2.0
965 stars 52 forks source link

localfile:// can just be file:// #11

Closed ryanking closed 7 years ago

ryanking commented 7 years ago

Context: https://github.com/grailbio/reflow/blob/4c4bdae8296ed8bb61d16855659be7a53c759082/local/localfile.go

Rather than using localfile as a url scheme, why not use the standard file scheme https://en.wikipedia.org/wiki/File_URI_scheme ?

ryanking commented 7 years ago

Actually maybe I don't understand what this is doing.

mariusae commented 7 years ago

Yeah, localfile is a little bit of a hack to solve a local usage. localfile is named as such because it refers to a file where the script is running, so it really only works sensibly in -local mode.

if you give file or dir just a path (without a scheme), it will refer to a file on the machine where Reflow is running:

% cat file.rf
@requires(cpu := 1)
val Main = len(exec(image := "ubuntu") (out file) {"
    cat {{file("./file.rf")}} > {{out}}
"})

% reflow run file.rf
2017/10/26 20:55:03 run name: marius@localhost/2628cbd0
2017/10/26 20:55:05 -> file.Main    f2b9bd81 run    exec ubuntu cat {{flow}} > {{out}}
2017/10/26 20:55:11 <- file.Main    f2b9bd81 ok     exec 0s 115B
2017/10/26 20:55:11 total n=1 time=7s
    ident     n   ncache runtime(m) cpu mem(GiB) disk(GiB) tmp(GiB)
    file.Main 1   0                                        

115
%

Note that currently this only supports files less than 10MB. (This will change.)