tomayac / local-reverse-geocoder

Local reverse geocoder for Node.js based on GeoNames data
Apache License 2.0
190 stars 58 forks source link

Crash when replacing yesterday's geodata (Extract.writer.error [Error: ENOENT: no such file or directory, lstat '/tmp/geonames/cities/cities1000.txt']) #39

Closed tkafka closed 3 years ago

tkafka commented 3 years ago

Reproduction steps:

  1. Run local-reverse-geocoder, let it create files with dates in name (eg. cities/cities1000_2021-03-19.txt)
  2. Wait another day (or rename file to older date, eg. cities/cities1000_2021-03-19.txtcities/cities1000_2021-03-15.txt)
  3. Start local-reverse-geocoder again.

What I expected:

Geocoder updates its files.

What actually happens

Geocoder crashes with Extract.writer.error [Error: ENOENT: no such file or directory, lstat '/tmp/geonames/cities/cities1000.txt'].

The cause is this:

Fix ideas:

  1. Fix the underlying stream libs (fstream shouldn't touch the extracted file after it passes close event)
  2. Ignore error event from unzip.Extract
    extractStream
          .on('error', function (err) {
            if (err.code === 'ENOENT') {
              // ignore - fstream writer runs stat/lstat after extract is finished (and .close sent), and it crashes, because we have already used and moved/deleted it's file
            } else {
              throw err
            }
          })
  3. Use unzip to extract just a single file we need, in memory, without writing to disk
  4. Keep the extracted file (eg. cities1000.txt), so that we allow fstream to touch it however it wants after the extraction.

Beware:

All _get*data methods have this problem, it needs to be fixed everywhere.

tkafka commented 3 years ago

It is very curious that no one has reported this before, as it happens every time you don't clean up the data and restart local-reverse-geocoder. I may have discovered it by moving dumpDirectory to /tmp/geonames (which I believe should be a default place, the geo database shouldn't be just dumped inside node_modules).

tomayac commented 3 years ago

I think what you're asking for is making GEONAMES_DUMP user-configurable. I'm happy to merge a PR that adds this.

tkafka commented 3 years ago

@tomayac Hi Thomas, that was already configurable - see here.

I dived into the error with streams, but I think that the real problem is that one of node-unzip-2, fstream, or their interactions is wrongly written. However, there was no need to actually store the unzipped files on a disk, so I forked local-reverse-geocoder to https://github.com/tkafka/local-reverse-geocoder and rewrote it to use unzip-stream to extract the wanted file from stream to disk.

See https://github.com/tkafka/local-reverse-geocoder/commit/6e9650062678610b20f400697c945d19bf0323b2

tomayac commented 3 years ago

@tomayac Hi Thomas, that was already configurable - see here.

Oh, sorry. It's been a while that I touched this code.

I dived into the error with streams, but I think that the real problem is that one of node-unzip-2, fstream, or their interactions is wrongly written. However, there was no need to actually store the unzipped files on a disk, so I forked local-reverse-geocoder to https://github.com/tkafka/local-reverse-geocoder and rewrote it to use unzip-stream to extract the wanted file from stream to disk.

Thanks for working on this, but…

See tkafka@6e96500

…I wonder why you went for forking this. I am and always was happy to merge PRs. I'd be happy to add you to the list of contributors.

tkafka commented 3 years ago

@tomayac Please don't take it personally, I just needed it fixed quickly. I just created a pull request https://github.com/tomayac/local-reverse-geocoder/pull/40, and I will be happy to delete local-reverse-geocoder-2 from NPM after this fix is published.

I have been running it on my server for past few days, and finally the geocoder reloads smoothly without crash, even when server is rebooted a day after previous start.

Feel free to add me on a contributor list - thank you!