osm-oscar / oscar

A search engine for OpenStreetMap
https://www.oscar-web.de
Other
12 stars 4 forks source link

Working sample configs for oscar-create #37

Open orblivion opened 3 years ago

orblivion commented 3 years ago

I'm trying to get oscar-create to work for a sort of proof-of-concept project. I'm looking for any working configuration, even if it's very basic, and then experiment from there if I need any adjustments. I tried working with sampleConfig.json, but I had issues with it, detailed below.

So, what I'm wondering is, do you have a working set of config files handy, that you would not mind sharing? Or if you know what I did wrong with my config, that would work for me as well. Or, maybe the segfault is just an unrelated bug.

Thank you!


As for what I tried:

I started with sampleConfig.json. It lists options, but I didn't see documentation for it, so I sort of had to guess as to how to get the simplest version to work. I have it successfully parsing now, but from there it ends in a segfault.

My config.json ``` { "grid": { "enabled": false, "latcount": 100, "loncount": 100 }, "index": { "check": true, "deduplicate": true, "type": "simple" }, "rtree": { "config": { "latcount": 100, "loncount": 100 }, "enabled": false, "type": "gridbased" }, "stats": { "print-memory-usage": false }, "store": { "addParentInfo": false, "addRegionsToCells": false, "blobFetchCount": 1, "cellRefining": { "type": "none", "value": "cell-diag" }, "enabled": true, "fullRegionIndex": false, "geoclean": "none", "hashConfig": "auto", "itemFilter": "all", "nodeTableSize": 100000, "regionFilter": { "keyValues": "empty-obj.json", "keys": "empty.json" }, "sorting": { "order": "score" }, "splitValues": "empty.json", "tagFilter": "all", "threadCount": 0, "triangleRefining": { "simplify": "true", "type": "none", "value": "conforming" } }, "tempdir": { "fast": "/dev/shm/osmc", "slow": "./tempdir/osmc" }, "textsearch": { "config": { "_comment": "oomgeocell config", "cellLocalIds": false, "foreignObjects": false, "maxMemoryUsage": 16, "threadCount": 0, "tmpFileType": "fast file" }, "id": "unique-id-string", "type": "items" } } ```

empty.json (used in my config.json):

[]

empty-obj.json (used in my config.json):

{}

My input file (~11MB):

https://download.geofabrik.de/north-america/canada/prince-edward-island-latest.osm.pbf

The output: ``` Selected Options: Stats config: print memusage: no IndexStoreConfig: input store: type: simple check: yes deduplicate: yes KVStoreConfig: Number of threads: 0 Number of blobs to fetch at once: 1 Max node table entries: 100000 Keys whose values are infalted: /home/user/oscar/build/empty.json Node HashTable size: auto ItemSaveDirector rule: everything Score config: Item sort order: score Read boundaries: no Keys defining regions=/home/user/oscar/build/empty.json Key:Values defining regions=/home/user/oscar/build/empty-obj.json FullRegionIndex: no Add parent info: no Add regions to cells they enclose: no Geometry cleaning: none min=0, max=0 Fetching residential areas without matching tags but with a place-node inside Residential area extraction: Ways' node-refs: 8830474|11678897=75.61% (1|0|1) Residential area extraction: Ways' node-refs: 1 seconds for 11M 678K 897 =11678897 1/s Residential area extraction: Found 184 ways Residential area extraction: Need to fetch 3912 nodes Residential area extraction: Ways' nodes: 5381362|11678897=46.08% (1|1|2) Residential area extraction: Ways' nodes: 1 seconds for 11M 678K 897 =11678897 1/s Residential area extraction: Assembling ways: 7566196|11678897=64.79% (1|0|1) Residential area extraction: Assembling ways: 1 seconds for 11M 678K 897 =11678897 1/s Residential area extraction: Relation's ways: 10021346|11678897=85.81% (1|0|1) Residential area extraction: Relation's ways: 1 seconds for 11M 678K 897 =11678897 1/s Residential area extraction: Found 12 relations with 59 ways Residential area extraction: Relation-ways' node-refs: 0 seconds for 11M 678K 897 Residential area extraction: Need to fetch 2315 nodes Residential area extraction: Relation-ways' nodes: 1354698|11678897=11.6% (1|7|8) Residential area extraction: Relation-ways' nodes: 1 seconds for 11M 678K 897 =11678897 1/s Residential area extraction: Assembling relations: 556341|11678897=4.764% (1|19|20) Residential area extraction: Assembling relations: 1 seconds for 11M 678K 897 =11678897 1/s Residential area extraction: Assembled 184/184 ways and 12/12 relations GridRegionTree::create: 0 seconds for 100 OsmGridRegionTree::printStats--BEGIN GridRegionTree::printstats--BEGIN Nodes: 134 Grids: 34 ChildPtrs: 232 of which 42.67241379% are NULL LeafInfo: 288 of which 0% are enclosed Regions: 196 Real Storage usage: 6KiB 896NiB Storage usage: 7040 Containment tests: 0 of which -nan% were intersection tests GridRegionTree::printstats--END #points: 6632=155KiB 448NiB #GeoMultiPolygons: 55=3KiB 8NiB OsmGridRegionTree::printStats--END Collected 0 residential regions Region extraction: Ways' node-refs: 1615414|11678897=13.83% (1|6|7) Region extraction: Ways' node-refs: 9772077|11678897=83.67% (2|0|2) Region extraction: Ways' node-refs: 2 seconds for 11M 678K 897 =5839448 1/s Region extraction: Found 0 ways Region extraction: Need to fetch 0 nodes Region extraction: Ways' nodes: 6971169|11678897=59.69% (1|0|1) Region extraction: Ways' nodes: 1 seconds for 11M 678K 897 =11678897 1/s Region extraction: Assembling ways: 10880188|11678897=93.16% (1|0|1) Region extraction: Assembling ways: 1 seconds for 11M 678K 897 =11678897 1/s Region extraction: Relation's ways: 0 seconds for 11M 678K 897 Region extraction: Found 0 relations with 0 ways Region extraction: Relation-ways' node-refs: 1657109|11678897=14.19% (1|6|7) Region extraction: Relation-ways' node-refs: 1 seconds for 11M 678K 897 =11678897 1/s Region extraction: Need to fetch 0 nodes Region extraction: Relation-ways' nodes: 4578098|11678897=39.2% (1|1|2) Region extraction: Relation-ways' nodes: 1 seconds for 11M 678K 897 =11678897 1/s Region extraction: Assembling relations: 7465845|11678897=63.93% (1|0|1) Region extraction: Assembling relations: 1 seconds for 11M 678K 897 =11678897 1/s Region extraction: Assembled 0/0 ways and 0/0 relations GridRegionTree::create: 0 seconds for 100 OsmGridRegionTree::printStats--BEGIN GridRegionTree::printstats--BEGIN Nodes: 1 Grids: 1 ChildPtrs: 100 of which 100% are NULL LeafInfo: 0 of which -nan% are enclosed Regions: 0 Real Storage usage: 80NiB Storage usage: 80 Containment tests: 0 of which -nan% were intersection tests GridRegionTree::printstats--END #points: 0=0 NiB #GeoMultiPolygons: 0=0 NiB OsmGridRegionTree::printStats--END OsmTriangulationRegionStore: extracting points...done OsmTriangulationRegionStore: extracting segments...done Found 0 different points creating 0 different segments Converting points to CGAL points...done OsmTriangulationRegionStore: creating triangulation...took 2us Setting cellids: 0 seconds for 0 Refining cells...done Found 1 cells Finding relevant regions: 139774|11678897=1.197% (1|82|83) Finding relevant regions: 4441979|11678897=38.03% (2|3|5) Finding relevant regions: 11519249|11678897=98.63% (3|0|3) Finding relevant regions: 3 seconds for 11M 678K 897 =3892965 1/s Found 1 cells containing items GridRegionTree::create: 0 seconds for 100 OsmGridRegionTree::printStats--BEGIN GridRegionTree::printstats--BEGIN Nodes: 1 Grids: 1 ChildPtrs: 100 of which 100% are NULL LeafInfo: 0 of which -nan% are enclosed Regions: 0 Real Storage usage: 80NiB Storage usage: 80 Containment tests: 0 of which -nan% were intersection tests GridRegionTree::printstats--END #points: 0=0 NiB #GeoMultiPolygons: 0=0 NiB OsmGridRegionTree::printStats--END Creating final TriangulationRegionStore OsmTriangulationRegionStore: extracting points...done OsmTriangulationRegionStore: extracting segments...done Found 0 different points creating 0 different segments Converting points to CGAL points...done OsmTriangulationRegionStore: creating triangulation...took 3us Could not re-add 0 edges Setting cellids: 0 seconds for 0 Total time to create the region store with 0 regions: 16s 600ms 107us Fetching strings: 964429|11678897=8.258% (1|11|12) Fetching strings: 4441979|11678897=38.03% (2|3|5) Fetching strings: 10320048|11678897=88.36% (3|0|3) Fetching strings: 3 seconds for 11M 678K 897 =3892965 1/s Finalizing string table...143 msecs Inserting region store items Fetching multipolygon items: Relation's ways: 6078117|11678897=52.04% (1|0|1) Fetching multipolygon items: Relation's ways: 1 seconds for 11M 678K 897 =11678897 1/s Fetching multipolygon items: Found 571 relations with 15024 ways Fetching multipolygon items: Relation-ways' node-refs: 9122435|11678897=78.11% (1|0|1) Fetching multipolygon items: Relation-ways' node-refs: 10880188|11678897=93.16% (2|0|2) Fetching multipolygon items: Relation-ways' node-refs: 3 seconds for 11M 678K 897 =3892965 1/s Fetching multipolygon items: Need to fetch 715381 nodes Fetching multipolygon items: Relation-ways' nodes: 5550469|11678897=47.53% (1|1|2) Fetching multipolygon items: Relation-ways' nodes: 1 seconds for 11M 678K 897 =11678897 1/s Fetching multipolygon items: Assembling relations: 1992160|11678897=17.06% (1|4|5) Fetching multipolygon items: Assembling relations: 3 seconds for 11M 678K 897 =3892965 1/s Fetching multipolygon items: Assembled 0/0 ways and 571/571 relations Processing file: 139774|11678897=1.197% (3|247|250) Processing file: 964429|11678897=8.258% (4|44|48) Processing file: 2956852|11678897=25.32% (5|14|19) Processing file: 4900599|11678897=41.96% (6|8|14) Processing file: 9122435|11678897=78.11% (7|1|8) Processing file: 9529932|11678897=81.6% (9|2|11) Processing file: 10320048|11678897=88.36% (12|1|13) Processing file: 10691905|11678897=91.55% (13|1|14) Processing file: 11125186|11678897=95.26% (15|0|15) Processing file: 11228466|11678897=96.14% (16|0|16) Processing file: 11519249|11678897=98.63% (18|0|18) Processing file: 11542394|11678897=98.83% (19|0|19) Processing file: 19 seconds for 11M 678K 897 =614678 1/s Took 5 rounds to process the file Sorting items... Sorting items took 114ms 303us Pruned 0 empty cells CellCreator: creating GeoHierarchy for 0 regions out of 1 cells Creating GeoRegionGraph: 0 seconds for 0 Found 1 different cells Maximum cell-split: 0 Maximum children: 0 Constructing Hierarchy sserialize::spatial::GeoHierarchy--stats-BEGIN Hierarchy is empty sserialize::spatial::GeoHierarchy--stats-END GeoHierarchy passed the consistency check. OsmKeyValueObjectStore::stats::begin Items: 177228 Keys: 814 Values: 22274 Boundary: GeoRect[(45.58351225, -66.37246048); (52.22285217, -55.27018571)] OsmKeyValueObjectStore::stats::end Time to create KeyValueStore: 50s 923ms 870us Serializing KeyValueStore Serializing OsmKeyValueObjectStore payload...took 0 seconds DynamicKeyValueObjectStore::serialize: Serializing string tables...took 0 seconds KeyValueObjectStore::serialize: items: 0 seconds for 177K 228 terminate called after throwing an instance of 'sserialize::VersionMissMatchException' what(): VersionMissMatchException (want=4, have=90): Static::Array Aborted ```
dbahrdt commented 3 years ago

Hi, I'll take a look into the crash as soon as possible (however I'm on vacation for the next 14 days and I don't know how much time I'll find to address this). Maybe you can try to take a look at https://github.com/dbahrdt/oscar-docker? This will give you a whole installation of oscar and oscar-web with which you can build the necessary files. The docker container also has support for routing. I'll try to come up with a small sample configuration so that you have something to work with.

Depending on what you're planing to do I do want to stress the fact that oscar-create needs a lot of memory if you intend to process larger files. In order to process the planet data set you'll need about 256 GiB of RAM. However you may find pre-processed files at http://data.oscar-web.de. The latest version is from February since I started porting all size related stuff to uint64_t due to OpenStreetMap containing too many items for oscar to process.

orblivion commented 3 years ago

I appreciate it! No rush at all. I will check out the docker as well. If it has a usable config perhaps it would work for me. I don't think my project would work with docker though, and at any rate I'd like to be able to build everything up from scratch.

My project will probably be memory sensitive, that's useful info. However I expect the use case to be more on the scale of a US state.

On Sat, Sep 11, 2021 at 8:49 AM dbahrdt @.***> wrote:

Hi, I'll take a look into the crash as soon as possible (however I'm on vacation for the next 14 days and I don't know how much time I'll find to address this). Maybe you can try to take a look at https://github.com/dbahrdt/oscar-docker? This will give you a whole installation of oscar and oscar-web with which you can build the necessary files. The docker container also has support for routing. I'll try to come up with a small sample configuration so that you have something to work with.

Depending on what you're planing to do I do want to stress the fact that oscar-create needs a lot of memory if you intend to process larger files. In order to process the planet data set you'll need about 256 GiB of RAM. However you may find pre-processed files at http://data.oscar-web.de. The latest version is from February since I started porting all size related stuff to uint64_t due to OpenStreetMap containing too many items for oscar to process.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dbahrdt/oscar/issues/37#issuecomment-917402134, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAKH6DUAR5YMHONFKMIATTUBNF6LANCNFSM5DYLVUIA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

orblivion commented 2 years ago

I'm about to look at this project again. l figured I should let you know a little more about what I'm working on in case you can give me other suggestions. For the time being, I will see if the Docker project gives me something to work with. In the end I'd like to be able to build everything from source, though.

The project I'm working on is to get a fully self-hosted web-based map, letting the user choose the region. (Either pre-defined regions or arbitrary regions, whatever is easier).

I'm trying to keep the app as simple and self-contained as possible. With the environment I'm targeting for the app's server, running a database like Postgres (that usually requires multiple user accounts) is complicated. Amazingly I figured out a solution for the map part (Protomaps), but I still need a search function. Oscar looks like a single-user solution to the search issue, which is just what I need.

So what I think I need to do now is to prepare (before the user installs the app) Oscar files from raw OSM data of every desired region that the user might select from. I could host these files on a server, and the user's app would download them as the user select the given regions. The Oscar file generation step can be run in a normal Linux environment, with fewer resource constraints than the app server.

orblivion commented 2 years ago

FYI I found an alternative for my needs, so don't spend any extra energy for my sake at this point. Leaving this open though since it could be worth looking at in general.

dbahrdt commented 1 year ago

Sorry for my unkind long delay. I just don't have the time to take care of external issues of this project anymore. I still maintain OSCAR such that it is working since I'm using it myself. However I don't have the time to take care of additional work. Unfortunately OSCAR is rather complex and undocumented which does not help with finding another maintainer. Nevertheless we have a new student working on the front end at least while they're at the university (so far all students quit maintainership after they finished their studies).

It might be helpful for other persons that stumble upon OSCAR, if you could comment what alternative you found.

orblivion commented 1 year ago

I'm using Gazetteer to extract the data. The output is a list of json objects. I'm extracting what I need from it (which is not much, yet) into sqlite fts5. It's a hack, and probably won't compete with a proper OSM search engine, but it's good enough for now.

That said, Gazetteer crashes on some data sets too. I'm hoping I can get some help from them. I'm not sure about their availability either.

Anyway, thanks for chiming back in.

BTW if you're curious what I'm working on: https://github.com/orblivion/sandstorm-share-a-map