shapely / shapely

Manipulation and analysis of geometric objects
https://shapely.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
3.77k stars 560 forks source link

wkt/wkb load and dump functions #654

Open kannes opened 5 years ago

kannes commented 5 years ago

https://shapely.readthedocs.io/en/stable/manual.html#well-known-formats describes loads and dumps for binary/text but not the also existing load and dump functions for directly reading/writing files.

I will add this soon.

kannes commented 5 years ago

Hmmm, they are less useful than I thought as they will only operate on single geometries.

Should load rather try to parse the file line by line? And should dump try to handle both single geometry and sequences? pickle dump/load supports sequences, simplejson handles them as well.

I would be interested in giving this a try to implement.

kannes commented 5 years ago

How about this for WKT?

load will use readlines, then check if it is more than one line. WKT is line-delimited, right? Trailing newlines are properly ignored by this.

dump will try to iterate (ala https://stackoverflow.com/a/1952655/4828720) and act accordingly. As far as I know "\n" is universally ok to use.

Successfully tested with single and multiple Points so far.

def load(fp):
    """Load WKT geometry/-ies from an open file.

    The return value of the function depends on the WKT file input:
    - A single geometry will be returned as such.
    - Multiple, newline-delimited geometries will be returned in a list.
    """
    data = fp.readlines()

    if len(data) > 1:
        return [loads(line) for line in data]
    else:
        return loads(data[0])

def dump(ob, fp, **settings):
    """Dump a geometry or a sequence of geometries to an open file.

    If a sequence of geometries is given, they will be written newline-delimited.
    """
    try:
        iterator = iter(ob)
    except TypeError:
        fp.write(dumps(ob, **settings))
    else:
        fp.writelines([dumps(o, **settings)+"\n" for o in ob])

Not sure what to prefer for writing, writelines with a list comprehension or simply a for-loop with write. Once needs to add newlines in both cases anyways...

kannes commented 5 years ago

Obviously iter(ob) will work for all those iterable geometries, duh! Needs an "not isinstance(ob, shapely.geometry.base.BaseGeometry) and". I should test better... ;)

sgillies commented 5 years ago

@kannes I'm not in favor of adding dump or load. It's only a couple lines of code for the (rare) user that need this and I don't think it's worth adding more code and tests to Shapely.

kannes commented 5 years ago

The more I thought about it, the less I liked them myself. Too much ambiguity. Especially since loads and dumps also only support single geometries.

I'd even suggest dropping the existing, single-geometry functions to be honest. But who knows who might rely on them.

snorfalorpagus commented 5 years ago

It looks like the existing dump and load functions are not tested either.

https://coveralls.io/builds/22225876/source?filename=shapely%2Fwkt.py

Should we mark these as deprecated an remove in a future version?

kannes commented 5 years ago

+1 from me.

Making the functions work reasonably well for arbitrary input is a lot of work. On the other hand, writing some simple file reading and writing is easy in Python when you know your data.