uber / h3

Hexagonal hierarchical geospatial indexing system
https://h3geo.org
Apache License 2.0
4.95k stars 470 forks source link

Getting unexpected results when converting coordinates in either direction #828

Closed Stophface closed 7 months ago

Stophface commented 7 months ago

First of all, thanks for this awesome library! I really appreciate your work on it and making it Open Source. I am getting some strange results I cannot explain and I am not sure where the problem might come from. Maybe one of you maintainers can pin point me into the right direction...

I am converting coordinates to H3 hashes (in JavaScript) and I save them into a database. This is pretty straight forward:

  let h3Hashes = new Set();
  locations.forEach(location => {
    h3Hashes.add(
      latLngToCell(
        location.latitude,
        location.longitude,
        11,
      ),
    );
  });

Because the coordinates can be in distances of lets say 100m each, I do the following

  calculateTraversedHashes(h3Hashes) {
    const h3HashesArray = Array.from(h3Hashes);
    let traveseredH3Hashes = new Set();
    if (h3HashesArray.length === 1) {
      return h3HashesArray;
    } else {
      for (let i = 1; i < h3HashesArray.length; i++) {
        const starthash = h3HashesArray[i - 1];
        const endHash = h3HashesArray[i];
        const path = gridPathCells(starthash, endHash);
        path.forEach(item => traveseredH3Hashes.add(item));
      }
      return Array.from(traveseredH3Hashes);
    }
  }

const uniqueTraveseredH3HashesArray = calculateTraversedHashes(h3Hashes);

In order to get the associated coordinates for each H3 hash, I do this

      const traversedCoordinatesArray =
        uniqueTraveseredH3HashesArray.map(traveseredH3Hash =>
          cellToLatLng(traveseredH3Hash),
        );

And that is what I write into my database. The database has three columns: h3_hash, latitude, longitude and count (not of interest right now). Now, in another part of my application I read from the database via a map view. From the map view I get the coordinates of the map views extend. I use this extend as a bounding box to get the H3 hashes that are in the map view extend (and thus: should be displayed)

    SELECT
      h3_hash_11.h3_hash,
      h3_hash_11.count,
      h3_hash_11.latitude,
      h3_hash_11.longitude
    FROM
      h3_hash_11
    WHERE
      h3_hash_11.longitude <= 13.48288 AND
      h3_hash_11.longitude >= 13.41174 AND
      h3_hash_11.latitude <= 52.54138 AND
      h3_hash_11.latitude >= 52.47233;

And now, this happens: The H3 hashes I read from the database do not match up with the h3 hashes I create on the fly with the coordinates I get from the same table.

Considering this

  const allData = readH3HashesAndCoordinatesFromDatabase()

 # Re-Calculate h3 hashes from coordinates and save the hashes in a Set
  const uniqueDiscoveredHashes = new Set()
  allData.forEach(record => uniqueDiscoveredHashes.add(
    latLngToCell(
      record.latitude,
      record.longitude,
      11
    ))
  )

 # Re-Calculate h3 hashes from coordinates and save the hashes in an Array
 const discoveredHashes = allData.map(record => latLngToCell(
      record.latitude,
      record.longitude,
      11
    ))

  console.log(
    "H3 HASHES AS SET RECALCULATED FROM COORDS: ", uniqueDiscoveredHashes.size,
    "H3 HASHES AS ARRAY RECALCULATED FROM COORDS: ", discoveredHashes.length,
    "H3 HASHES AS ARRAY FROM DB: ", allData.map(record => record.h3Hash).length,
    "H3 HASHES AS SET FROM ARRAY FROM DB: ", new Set(allData.map(record => record.h3Hash)).size
  )

Which gives this

H3 HASHES AS SET RECALCULATED FROM COORDS: 887,
H3 HASHES AS ARRAY RECALCULATED FROM COORDS: 1015 
H3 HASHES AS ARRAY FROM DB: 1015
H3 HASHES AS SET FROM ARRAY FROM DB: 1015

And this I do not understand. Why is this uniqueDiscoveredHashes less than all the others? I checked if I have duplicate h3 Hashes in the database, there are none

SELECT count(h3_hash) FROM h3_hash_11;

returns the same as

SELECT DISTINCT count(h3_hash) FROM h3_hash_11;

And this

SELECT h3_hash, COUNT(*) FROM h3_hash_11 GROUP BY h3_hash HAVING COUNT(*) > 1;

returns 0 rows.

This

    SELECT
      COUNT(h3_hash_11.h3_hash)
    FROM
      h3_hash_11
    WHERE
      h3_hash_11.longitude <= 13.48288 AND
      h3_hash_11.longitude >= 13.41174 AND
      h3_hash_11.latitude <= 52.54138 AND
      h3_hash_11.latitude >= 52.47233;

returns 1015.

So obviously there are no duplicates in my database, but strangely I get different results. Could that be connected to the distortion from the gnomonic projection?

dfellis commented 7 months ago

I think you closed this without realizing, so I'm reopening this issue.

First of all, this probably belongs on StackOverflow instead of as a repo issue, but two, uber/h3-js is probably the more correct repo in the future.

Anyways, my suspicion: the regenerated set that's missing some values versus the rest is due to rounding of the latitude and longitude columns either in the database or by your database client in the round trip.

Can you run something like this:

console.log(allData.filter(record => latLngToCell(
      record.latitude,
      record.longitude,
      11
    ) !== record.h3_hash))

This will print the records whose lat/lng coordinates don't hash to the same index that they're stored with. If this list is non-zero, then something in your pipeline is truncating relevant data.

My first suspicion would be accidentally using the 32-bit float instead of the 64-bit double type for floating point data in the database (or whatever it's called in your database, since the actual database engine is never specified).

My less likely suspicion is whatever library you're using to communicate with your database is truncating data to/from the database (perhaps using a text mode representation of the floating point values and not sending all bits of precision to/from the database). I consider this less likely because it's 2024, not 2014, and Node.js database clients making mistakes like this should have been caught by now.

Stophface commented 7 months ago

@dfellis Thats what I did before closing it and it brought me on the right track. There was a mess up with the way I am writing the hashes AND the coordinates into the database. It did not match up, making the bounding box query useless.

I came here because I thought that due to the "distortion from the gnomonic projection" (which I do not understand at all) there might be some sort of weird mess with the coordinates maybe falling into several hashes...

dfellis commented 7 months ago

So all projection systems of the round Earth onto a flat surface have distortions. It's just that the vast majority of 2D projects people work with are in Mercator Projection, which was designed for ship navigation and notoriously makes the areas near the poles much larger than they are in reality.

Meanwhile the Gnomonic projections centered on each Icosahedron face that H3 uses has less overall distortion than Mercator, but it's also a different kind of distortion, so a straight line in H3 looks curved in Mercator (and vice versa), and that can confuse people when they render things on a Mercator map.

It's generally only visible at the largest scales on a map (at the city level your eyes can't see a difference, but at a continental level you start to see the differences), but if you naively thought you could take the cellToBoundary coordinates and then do a point-in-poly algorithm within those bounds, a small fraction of points along the boundary of the hexagon will flag as "contained" when H3 says they aren't and vice versa.

That's what the mentioning of Gnomonic distortion is about -- if you just use lat, lng as a Cartesian coordinate system then you assume the Mercator distortions aren't distortions at all and you get weird results vs reality and other projection systems.

Stophface commented 7 months ago

@dfellis Thanks for the detailed explanation. Yeah, I guess Mercator Projection made sense during that time, when the world was "explored" using ships. When drawing a line in an angle on the map, you want that angled line really represent the course your ship is taking :)

Anyway, it often helps me to talk to others about challenges I encounter during programming. Since the project I am working on is a hobby thing (it will be a pretty cool App which would not be possible without H3), there is really no one I can talk to an structure some of my thoughts and ideas. I guess that is why I turn to GitHub forums and/or stackoverflow.