pelias / polylines

Pelias import pipeline for polyline (road network) data.
MIT License
17 stars 24 forks source link

Is there any way to reimport the polyline data without creating duplicates? #233

Closed pravink closed 4 years ago

pravink commented 4 years ago

Hey team!

I was using your awesome geocoding engine when I noticed something interesting. Let me tell you more about it.


Can i reimport polyline data again in ES without creating duplicates?

I have imported polyline data but my import process stopped in between. but if i start the import process again how do i stop it from not importing same data again to stop duplicates docs on ES?

if I do import the same file again. is there any way to clear those from ES directly.

missinglink commented 4 years ago

The short answer is no.

The polylines do not have unique IDs, each record is assigned an ID based on it's row number within the file, subsequent reimports will override existing IDs, but unlikely how you would like it to do.

You can surely clear the existing data from Elasticsearch using https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

I would suggest that you use a regular query to test that your delete conditions are working correctly first in order to avoid deleting more than you expected.

What you probably want is simply match: { layer: street }

orangejulius commented 4 years ago

You might also be interested in this section of the Pelias from Scratch documentation: Aside: When to delete the data already in Elasticsearch

pravink commented 4 years ago

Hi @missinglink @orangejulius Really appreciate your quick responses.

I did tried to re-import the planet file again https://data.geocode.earth/osm/2019-13/planet-latest-valhalla.polylines.0sv.gz but it gets failed due to below error. I am not sure if there is any issue with the file itself or not. But For now i will go ahead with generating and importing polylines country by country. and for Prod i will recreate the index and import the data again.

indexed=25833000, batch_ok=51666, batch_retries=0, failed_records=0, doc=25833000, persec=1400 2019-11-11T08:31:52.798Z - ESC[32minfoESC[39m: [dbclient-polylines] paused=false, transient=1, current_length=427, indexed=25849000, batch_ok=51698, batch_retries=0, failed_records=0, doc=25849000, persec=1600 2019-11-11T08:31:53.136Z - ESC[31merrorESC[39m: [polyline] polyline document error message=invalid regex test, Open for public since April 2017. Source: https://finance.detik.com/berita-ekonomi-bisnis/3474957/jokowi-resmikan-tol-akses-tanjung-priok-siang-ini should not match /https?:\/\//, stack=PeliasModelError: invalid regex test, Open for public since April 2017. Source: https://finance.detik.com/berita-ekonomi-bisnis/3474957/jokowi-resmikan-tol-akses-tanjung-priok-siang-ini should not match /https?:\/\// at Object.nomatch (/efs/pelias-code/polylines/node_modules/pelias-model/util/valid.js:117:13) at Document.setName (/efs/pelias-code/polylines/node_modules/pelias-model/Document.js:242:18) at DestroyableTransform._transform (/efs/pelias-code/polylines/stream/document.js:30:11) at DestroyableTransform.Transform._read (/efs/pelias-code/polylines/node_modules/readable-stream/lib/_stream_transform.js:177:10) at DestroyableTransform.Transform._write (/efs/pelias-code/polylines/node_modules/readable-stream/lib/_stream_transform.js:164:83) at doWrite (/efs/pelias-code/polylines/node_modules/readable-stream/lib/_stream_writable.js:405:139) at writeOrBuffer (/efs/pelias-code/polylines/node_modules/readable-stream/lib/_stream_writable.js:394:5) at DestroyableTransform.Writable.write (/efs/pelias-code/polylines/node_modules/readable-stream/lib/_stream_writable.js:303:11) at DestroyableTransform.ondata (/efs/pelias-code/polylines/node_modules/readable-stream/lib/_stream_readable.js:662:20) at DestroyableTransform.emit (events.js:209:13), name=PeliasModelError 2019-11-11T08:31:53.137Z - ESC[31merrorESC[39m: [polyline] polyline document error message=invalid regex test, Open for Public Since April 2017. Source: https://finance.detik.com/berita-ekonomi-bisnis/3474957/jokowi-resmikan-tol-akses-tanjung-priok-siang-ini should not match /https?:\/\//, stack=PeliasModelError: invalid regex test, Open for Public Since April 2017. Source: https://finance.detik.com/berita-ekonomi-bisnis/3474957/jokowi-resmikan-tol-akses-tanjung-priok-siang-ini should not match /https?:\/\// at Object.nomatch (/efs/pelias-code/polylines/node_modules/pelias-model/util/valid.js:117:13) at Document.setName (/efs/pelias-code/polylines/node_modules/pelias-model/Document.js:242:18) at DestroyableTransform._transform (/efs/pelias-code/polylines/stream/document.js:30:11) at DestroyableTransform.Transform._read (/efs/pelias-code/polylines/node_modules/readable-stream/lib/_stream_transform.js:177:10) at DestroyableTransform.Transform._write (/efs/pelias-code/polylines/node_modules/readable-stream/lib/_stream_transform.js:164:83) at doWrite (/efs/pelias-code/polylines/node_modules/readable-stream/lib/_stream_writable.js:405:139) at writeOrBuffer (/efs/pelias-code/polylines/node_modules/readable-stream/lib/_stream_writable.js:394:5) at DestroyableTransform.Writable.write (/efs/pelias-code/polylines/node_modules/readable-stream/lib/_stream_writable.js:303:11) at DestroyableTransform.ondata (/efs/pelias-code/polylines/node_modules/readable-stream/lib/_stream_readable.js:662:20) at DestroyableTransform.emit (events.js:209:13), name=PeliasModelError 2019-11-11T08:32:02.807Z - ESC[32minfoESC[39m: [dbclient-polylines] paused=false, transient=0, current_length=396, indexed=25862500, batch_ok=51725, batch_retries=0, failed_records=0, doc=25862500, persec=1350 2019-11-11T08:32:12.846Z - ESC[32minfoESC[39m: [dbclient-polylines] paused=false, transient=1, current_length=54,indexed=25875000, batch_ok=51750, batch_retries=0, failed_records=0, doc=25875000, persec=1250

Thanks again!