whosonfirst / go-whosonfirst-exportify

Tools (written in Go) for exporting Who's On First records.
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

exportify a single file #3

Open missinglink opened 3 years ago

missinglink commented 3 years ago

Is it possible to exportify a single file?

I would like to pass a single feature in, either via stdin or a filename and have the response written to either stdout or a destination path.

The unix-y stdio stream interface would be ideal for me because it would make it easier to include exportify in pipelines.

I had a look at the command line options and it doesn't currently seem possible, or maybe I'm wrong?

cat berlin.geojson | ./bin/exportify -reader-uri fs:///dev/stdin
2021/08/19 11:32:03 Failed to create reader for 'fs:///dev/stdin', root is not a directory
./bin/exportify -reader-uri fs://berlin.geojson
2021/08/19 11:35:59 Failed to create reader for 'fs:///code/pelias/wof/berlin.geojson', root is not a directory
cat berlin.geojson | ./bin/exportify -reader-uri stdin://
2021/08/19 11:36:55 Failed to create reader for 'stdin://', Unknown driver: stdin (STDIN)
thisisaaronland commented 3 years ago

Quickly, because I need to run away and attend to some other things.

There is currently no stdin reader that implements the go-reader interface. That would be useful for have for things like this.

Of the three examples, the second is nearly there. Specifically, it should be:

./bin/exportify -reader-uri fs:// /path/to/berlin.geojson

Reader URIs are basically just URI-based constructors with relevant instructions for how to read one or more paths defined separately from the constructor itself. This can be irritating at times but, in the end, is less irritating than all the issues raised by trying to pack multiple paths in to the constructor itself.

But! The example above won't work today. It should work (as in: it should be made to work) but it doesn't because the tool was originally written to mirror the CLI signature for the Python tool which assumes one or more WOF IDs rather than arbitrary GeoJSON features:

This code was probably written in between the time that the Python code was refactored to speak Python3 and was taught to assume multiple input sources, so the same rules should be applied here too.

The relevant bit being the expectation in main of a list of IDs and calls to the wof_reader.LoadBytesFromID method.

The actual exporting (and formatting) and writing part just assumes bytes and user-defined go-writer.Writer instance:

thisisaaronland commented 3 years ago

FYI:

I considered wiring this (reading from STDIN) in to the file:// package and special-casing the - path (to mean "STDIN") but decided to just be explicit about things.

missinglink commented 3 years ago

Cool thanks!

I feel like stdin is just a file living at /dev/stdin (optionally aliased as -), so it should be accessed via the fs:// protocol/scheme rather than defining its own protocol.

This method reminds me of /vsistdin/ from ogr2ogr and solves my problem anyway, thanks!

thisisaaronland commented 3 years ago

Can os.Open actually open /dev/stdin? I've never tried...

missinglink commented 3 years ago

Yea you totally can, but you don't really need to open it since it's already opened for you:

These two methods are equivalent:

func stdin() {
    bytes, err := ioutil.ReadAll(os.Stdin)
    log.Println(err, string(bytes))
}

func open() {
    f, _ := os.Open("/dev/stdin")
    bytes, err := ioutil.ReadAll(f)
    log.Println(err, string(bytes))
}

I usually do something like this which returns an *os.File regardless of which method is used:

// OpenFile - open a file
func OpenFile(name string, shouldCreate bool) *os.File {

    // stdin
    if name == "-" || name == "/dev/stdin" {
        return os.Stdin
    }

    if shouldCreate {
        if _, err := os.Stat(name); os.IsNotExist(err) {
            os.Create(name)
        }
    }

    // var file, _ = os.Open(name)
    file, err := os.OpenFile(name, os.O_RDWR, 0644)
    if err != nil {
        panic(err)
    }

    return file
}
thisisaaronland commented 3 years ago

Presumably the point of using os.Stdin however is that it handles cross-platform issues for you.

missinglink commented 3 years ago

It looks like that's handled at lower levels since os.Stdin refers to /dev/stdin. https://cs.opensource.google/go/go/+/refs/tags/go1.17:src/os/file.go;l=65

I had a quick poke around in the windows adapter but it wasn't immediately clear how it handles this: https://cs.opensource.google/go/go/+/refs/tags/go1.17:src/os/file_windows.go

missinglink commented 3 years ago

Agh so it looks like the Windows syscall adapter handles these unix-isms so they "just work" https://cs.opensource.google/go/go/+/master:src/syscall/syscall_windows.go;l=474?q=Stdin&ss=go%2Fgo:src%2Fsyscall%2F

thisisaaronland commented 3 years ago

I would favour just being explicit and boring about this and keeping STDIN-related stuff inside it's own package.

I suppose we could update go-reader/file.go constructor to check for and honour an ?allow-stdin query parameter and if present do something like:

if r.allow_stdin && uri == "/dev/stdin" {
    sr, _ := reader.NewReader(ctx, "stdin://")
    return rs.Read(ctx, "-")
}

And/or just create and store sr in the constuctor.

thisisaaronland commented 3 years ago

Or just this:

https://github.com/whosonfirst/go-reader/compare/allow-stdin

Which would allow something like this:

$> cat README.md | ./bin/read -reader-uri file:///dev/stdin - | wc -l
     348

Again, note the "-". The paths that a reader will consume are a separate issue from the reader's own constructor.

I don't really like this because it means we're in the business of guessing, and of hijacking, other people's definition of stdin in filenames.

thisisaaronland commented 3 years ago

As of v0.0.17 you can export records that are read from STDIN. For example:

$> cat /usr/local/data/sfomuseum-data-whosonfirst/data/102/527/513/102527513.geojson | ./bin/exportify -reader-uri stdin:// -writer-uri stdout:// | grep wof:name
    "wof:name": "San Francisco International Airport",

The relevant code is:

Note however that the exportify tool does not try to update or fix pre-2019 EDTF values so until those values are fixed (manually or in go-whosonfirst-export/properties/edtf.go) then this will not be unexpected:

$> cat /usr/local/data/whosonfirst-data-admin-us/data/102/061/079/102061079.geojson | ./bin/exportify -reader-uri stdin:// -writer-uri stdout://
2021/08/20 11:33:51 Failed to export from STDIN, Unrecognized EDTF string 'uuuu' (Invalid or unsupported EDTF string)

There is an open ticket for this: