whosonfirst / go-whosonfirst-spatial

Go package defining interfaces for Who's On First specific spatial operations.
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Speed and memory concern #17

Closed HIRANO-Satoshi closed 3 years ago

HIRANO-Satoshi commented 3 years ago

Thanks for README. I understand it's for structural reorganization.

go-whosonfirst-pip-v2 consumes days for Indexing. Is it possible to speed up indexing? hopefully ten or hundred times.

Another concern is memory usage. I'm not sure that memory thrashing is the main cause of the slow indexing or not. Of course we could use a larger machine, but we have some restriction.

thisisaaronland commented 3 years ago

Keeping in mind that this is all very much still work-in-progress, and the documentation remains spotty, you might have better luck with go-whosonfirst-spatial SQLite packages.

This demonstrates how the SQLite databases are created and queried:

And this demonstrates how those SQLite databases are queries over an HTTP interface, like the original go-whosonfirst-pip-v2 server:

The SQLite databases take less time to index than the original in-memory RTree implementation and can be created independently of the PIP server itself.

They also have better support for "extra" properties to be included with the default responses and support indexing alternate geometries. As of this writing there is not support for filtering alternate geometries either by type or name.

The original go-whosonfirst-pip-v2 RTree implementation was always constrained on indexing time due to the volume of files being indexed, I/O limits and the sequential nature of building the RTree index itself. There is an equivalent RTree implementation of the go-whosonfirst-spatial interfaces but that hasn't been updated to reflect the most recent developments yet:

There is nothing about this package that will address any of the problems around indexing time, though. For large volumes of data something like the SQLite package is probably a better alternative.

thisisaaronland commented 3 years ago

Update the go-whosonfirst-spatial-rtree package has been updated to reflect the current state of the go-whosonfirst-spatial interfaces:

thisisaaronland commented 3 years ago

Update 2: The go-whosonfirst-spatial-http-local package has been updated as well. This package assumes an rtree spatial index and a local checkout/repository of WOF records for indexing and for appending properties:

HIRANO-Satoshi commented 3 years ago

That seems nice.

I use "-allow-geojson -plain-old-geojson" flags with the v2 server to use GeoJSON files shaped from OSM's pbf files. Is it difficult to have a feature that makes db files from the geojson files with go-whosonfirst-sqlite-features-index (or something)?

thisisaaronland commented 3 years ago

It should be possible to use these tools with "plain old" GeoJSON documents, although that hasn't been tested yet.

If they don't work that will be considered a bug.

HIRANO-Satoshi commented 3 years ago

The benefit of the use of the sqlite is not applicable to the plain old GeoJSON. How do I make a sqlite database from a GeoJSON file?

HIRANO-Satoshi commented 3 years ago

I'm glad to see you are active again after the shutdown. Be careful with COVID-19.

thisisaaronland commented 3 years ago

"How do I make a sqlite database from a GeoJSON file?"

Ultimately, you should be able to use the same wof-sqlite-index-features tool used fof WOF-style GeoJSON documents. I can't guarantee that this works today but that is the goal.

The code has reached the stage where ensuring that everything works with both "alternate" geometry files (WOF) and "plain old" GeoJSON is the next thing to work on, shortly.

thisisaaronland commented 3 years ago

There is initial support for indexing, serving and querying "plain old" GeoJSON in both the in-memory (RTree) and SQLite implementations:

The in-memory server only support returning results as "standard places results" which mirrors the functionality of the go-whosonfirst-pip-v2 package. The SQLite servers supports optional formatting of results as "properties" dictionaries (appending properties to the SPR response) or GeoJSON.

It is entirely possible there are still bugs, edge cases or gotchas in both.

HIRANO-Satoshi commented 3 years ago

You mean the "-mode geojsonl" option of wof-sqlite-index-features?

thisisaaronland commented 3 years ago

I'm not sure I understand but if you're saying there's a bug in the go-whosonfirst-index code for processing line-separated GeoJSON then yes that too, maybe :-)

I think things have reached the stage with the go-whosonfirst-spatial- packages that it will time, shortly, to focus on the documentation which will try to address the different layers. On of them being the indexing (or really "crawling" or "walking") of an arbitrary data source, independent of how those data are finally processed.

If there are bugs with the indexing stuff please file an issue in this repo or here:

https://github.com/whosonfirst/go-whosonfirst-index

HIRANO-Satoshi commented 3 years ago

Sorry, I wanted to know usage of wof-sqlite-index-features for plain GeoJSON files.

I'll stop here. I'll give it a try someday not so far away.

Thanks much!