mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Can LatLonShape's tessellator create more search-efficient triangles? [LUCENE-8615] #614

Open mikemccand opened 5 years ago

mikemccand commented 5 years ago

The triangular mesh produced by LatLonShape's Tessellator creates reasonable numbers of triangles, which is helpful for indexing speed. However I'm wondering that there are conditions when it might be beneficial to run tessellation slightly differently in order to create triangles that are more search-friendly. Given that we only index the minimum bounding rectangle for each triangle, we always check for intersection between the query and the triangle if the query intersects with the MBR of the triangle. So the smaller the area of the triangle compared to its MBR, the higher the likeliness to have false positive when querying.

For instance see the following shape, there are two ways that it can be tessellated into two triangles. LatLonShape's Tessellator is going to return either of them depending on which point is listed first in the polygon. Yet the first one is more efficient that the second one: with the second one, both triangles have roughly the same MBR (which is also the MBR of the polygon), so both triangles will need to be checked all the time whenever the query intersects with this shared MBR. On the other hand, with the first way, both MBRs are smaller and don't overlap, which makes it more likely that only one triangle needs to be checked at query time.

2-tessellations.png

Another example is the following polygon. It can be tessellated into a single triangle. Yet at times it might be a better idea create more triangles so that the overall area of MBRs is smaller and queries are less likely to run into false positives.

re-tessellate-triangle.png


Legacy Jira details

LUCENE-8615 by Adrien Grand (@jpountz) on Dec 20 2018, updated Jan 17 2020 Attachments: 2-tessellations.png, re-tessellate-triangle.png, screenshot-1.png

mikemccand commented 4 years ago

Here is an idea. We currently have the following type of triangles in the index:

screenshot-1.png

This plot shows that the potentially more wasteful triangles are the ones where two of the points belongs to bounding box (first four possibilities). So I wonder we should prevent to add those types of triangles to the index and instead split them using the longest side.

Note that the side effect is that we can reduce the number of triangle types to 4.

[Legacy Jira: Ignacio Vera (@iverase) on Jan 14 2020]

mikemccand commented 4 years ago

This sounds like an interesting idea!

[Legacy Jira: Adrien Grand (@jpountz) on Jan 17 2020]