mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Add type of triangle info to ShapeField encoding [LUCENE-8997] #994

Open mikemccand opened 5 years ago

mikemccand commented 5 years ago

We are currently encoding three type of triangle in ShapeField:

Because we still have two unused bits, it might be worthy to encode this information in those two bits as follows:

We can later leverage this information so we don't need to decode all dimensions in case of POINT and LINE and we are currently computing in some of the methods ithe type of triangle we are dealing with, This will go as well.


Legacy Jira details

LUCENE-8997 by Ignacio Vera (@iverase) on Oct 02 2019, updated Jan 08 2020

mikemccand commented 4 years ago

I would like to raise this issue again as I make a small improvement. I realise that for points I do not need to add the point information for data dimensions, therefore I can just leave dimensions 5 and 6 empty. For BKD tree leaves that only contain points it means they will compress very well.

I have run the Lucene geo benchmarks for LatLonShape and I got a reduction of the index size of 30%!

 

||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader heap (MB)||

          ||Dev||Base||Diff ||Dev  ||Base  ||diff   ||Dev||Base||Diff||Dev||Base||Diff ||

|shapes|244.7s|250.7s|-2%|0.0s|0.0s| 0%|0.89|1.27|-30%|1.14|1.14| 0%|

[Legacy Jira: Ignacio Vera (@iverase) on Nov 14 2019]

mikemccand commented 4 years ago

I'm unsure about keeping dimensions empty: it works well if your index has only lines or only points since all points will have a value of 0 for certain dimensions. But if the index mixes triangles and points, then this could actually hurt?

[Legacy Jira: Adrien Grand (@jpountz) on Nov 14 2019]

mikemccand commented 4 years ago

I guess it could still work if we indexed this dimension, but I don't think this is the right trade-off.

[Legacy Jira: Adrien Grand (@jpountz) on Nov 14 2019]

mikemccand commented 4 years ago

I see your point, I revert that change.

[Legacy Jira: Ignacio Vera (@iverase) on Nov 14 2019]