Containers for families of geometric objects

mkoeppe commented 3 years ago

We define APIs for static and dynamic containers storing finite families of geometric objects, extending Container, Set, MutableSet, Mapping, MutableMapping https://docs.python.org/3/library/collections.abc.html

The family must define a collection of "measurement maps" B_i, each sending an object to an InternalRealInterval.

The product of these intervals can be thought of as a "generalized bounding box".

Ideally the maps would be inclusion-preserving.

The implementation is provided by rtree/libspatialindex (#32000).

The Sage-specific class will take care of:

replacing an InternalRealInterval by a rescaling or inclusion-preserving overestimation, making the coordinates suitable for the underlying library.
providing accelerated set operations such as:
- a fast path for computing empty intersection when the result will be empty,
- a fast path for __contains__ when the result will be False.

Geometric lookup operations to be supported:

...

Applications:

RationalPolyhedralFan
PolyhedralComplex (#31748)
Families/posets of (open) subsets of manifolds (#31740): A natural choice of the maps B_i comes from the coordinates of covering charts, using (-oo, oo) for coordinates for which no coordinate definition for the subset is known

Depends on #34277

CC: @DRKWang

Component: geometry

Work Issues: Rebase on #34277

Author: Binshuai Wang

Branch/Commit: public/32170 @ 05ea99d

Issue created by migration from https://trac.sagemath.org/ticket/32170

DRKWang commented 3 years ago

comment:1

The precision of interval values in Package libspatialindex is double.

They did not claim it explicitly, but I found it from

https://rtree.readthedocs.io/en/latest/class.html#rtree.index.Index
https://github.com/libspatialindex/libspatialindex/blob/master/include/spatialindex/capi/DataStream.h.

mkoeppe commented 3 years ago

comment:2

OK, the next step would be to find out whether the library does any arithmetic with these double values (in this case we would have to check whether they apply directed rounding correctly), or whether they only use comparisons on these double values.

mkoeppe commented 3 years ago

Description changed:

--- 
+++ 
@@ -1,4 +1,4 @@
-We define APIs for static and dynamic containers storing finite families of geometric objects.
+We define APIs for static and dynamic containers storing finite families of geometric objects, extending `Container`, `Set`, `MutableSet`, `Mapping`, `MutableMapping` https://docs.python.org/3/library/collections.abc.html

 The family must define a collection of "measurement maps" `B_i`, each sending an object to an `InternalRealInterval`.

DRKWang commented 3 years ago

comment:4

The library libspatialindex uses arithmetic operations, including +, - and *, with those double values in order to support some functions, such as

LineSegment::Collinear( https://libspatialindex.org/en/latest/doxygen/LineSegment_8cc_source.html#l00439),
LineSegment::getArea(https://github.com/libspatialindex/libspatialindex/blob/master/src/spatialindex/Region.cc)

and etc.

Besides, the library libspatialindex also uses std::numeric_limits<double>::epsilon() to deal with machine precision issue, such as Region::touchesPoint(const Point& p) (https://github.com/libspatialindex/libspatialindex/blob/master/src/spatialindex/Region.cc)

std::numeric_limits<double>::max();(https://github.com/libspatialindex/libspatialindex/blob/master/src/spatialindex/Point.cc).

DRKWang commented 3 years ago

comment:6

A few test codes have been made for testing arithmetic precision. The conclusions are as follows, more details and codes can be found from this(https://github.com/DRKWang/Data-structure-for-piecewise-linear-function/blob/main/log35.ipynb).

The dimension of the index database should always be greater than 1. (0 or 1 dimension will fail).
The arithmetic precision is 16 decimal digits, which comes from the double type's precision.
The minimal capturing of arithmetic precision is 324 decimal digits, which means the minimal difference to distinguish two numbers is 1*10^{-324}. otherwise, they will be viewed as the same number.
The intersection for retrieving is a closed intersection. (-0.000) is also considered as the same number as 0, but both representations will be displayed at the same time.
Rounding. The round processing is a little complex and untraceable, it is neither rounding up or rounding down.

DRKWang commented 3 years ago

comment:7

I have made some debugs for this library. I would like to list the key points below. The entire log file can be found here https://github.com/DRKWang/Data-structure-for-piecewise-linear-function/blob/main/logsfile/log36.ipynb.

The python wrapper part for rtree was only used for passing the values and data to 2 basic functions, core.rt.Index_InsertData() and core.rt.Index_Intersects_obj(). Those two functions are stored here https://github.com/Toblerity/rtree/blob/master/rtree/core.py. Thus, there is no arithmetic error in this part.
The corresponding part in libspatialindex for core.rt.Index_InsertData() and core.rt.Index_Intersects_obj() are placed in line 499 and line 678 in https://github.com/libspatialindex/libspatialindex/blob/master/src/capi/sidx_api.cc .
In the Index_InsertData() in C++, the author uses the concepts "points" and "boxes" to refine the input information, and then call the function index().insertData(**) to compute. As we discussed last time, the author mistakenly uses the std::numeric_limits<double>::epsilon() as the range of approximation, that is, if the bound difference is within this range, then they will be seen as a single point instead of a box. According to this note, https://stackoverflow.com/questions/48133572/what-can-stdnumeric-limitsdoubleepsilon-be-used-for, we may correct it by multiplying the magnitudes of the comparison. However, because it is for scientific computation, we should get rid of approximation as much as possible.
Except for the above place, the author also mistakenly has used std::numeric_limits<double>::epsilon() in many other places. I think we could inform the author to check them deeply.

DRKWang commented 3 years ago

comment:8

The extensions, .idx and .dat, are two files used to serialize the rtree to disk, according to https://rtree.readthedocs.io/en/latest/tutorial.html.
Those two files are created by #include <fstream> internally, based on this source code, https://github.com/libspatialindex/libspatialindex/blob/master/src/storagemanager/DiskStorageManager.cc.
The .idx is used to store vital information like the page size, the next available page, a list of empty pages, and the sequence of pages associated with every entity ID.There is no note for the detail or purpose of .dat file. Thus, I made a test where putting 50 boxes into this library. The shows that the .dat is way larger than .idx, which indicates the .dat is used to store the real data, whereas .idx is used for storage the rest of it.
The correct way to close and reopen those serialized files can be found here, https://gis.stackexchange.com/questions/254781/saving-python-rtree-spatial-index-to-file
Except for DiskStorageManager, it also has a MemoryStorageManager. The detail of it is that "Everything is stored in main memory using a simple vector. No properties are needed to initialize a MemoryStorageManager object. When a MemoryStorageManager instance goes out of scope, all data that it contains are lost."
The .dat_extension = xxx and .idx_extension = xxx can be used to change the names of those extensions.

DRKWang commented 3 years ago

comment:9

I am trying to build a subclass called rtree_based_realset(), which is inherited from the class RealSet(). But got stuck on the purpose of this subclass. I would like to list my chain of thoughts below and hope to get a verification.

The intention of building this subclass is to keep the interface of this class, (I feel basically the interface can be considered as functions), the same and use rtree data structure to be a preliminary filter for overestimating in order to reduce the computation time. But since rtree only serves as an overestimate from one side, in order to be accurate, this subclass, which we want to define, still needs the finite unit intervals structure inside. In that case, seemingly, the subclass does not have a big difference from the RealSet() class. And the difference between those two classes only comes from whether we use rtree as an auxiliary tools or not.

mkoeppe commented 3 years ago

comment:10

Replying to @DRKWang:

since rtree only serves as an overestimate from one side, in order to be accurate, this subclass, which we want to define, still needs the finite unit intervals structure inside.

That's right, we are not replacing the internal representation of RealSet instances.

We are only adding an index data structure to it.

DRKWang commented 3 years ago

comment:11

Since the rtree.count(), which will return the number of objects whose boxes have intersections with the target point, is not exactly suitable for our expected function, I try to figure out some other functions to replace it. In my opinion, the best function would be rtree.getfirst(), if we should have, which ideally can assist with both RealSet.contains() and RealSet.interesction(). However, I did not get such a function, instead, I found rtree.nearest(), which will return the item whose box has minimal distance to the target point. However, after comparing rtree.nearest() with rtree.count(), the test results shows the rtree.nearest() is slower than rtree.count().

mkoeppe commented 3 years ago

comment:12

As the boxes overestimate the true objects, a count result of 0 is useful.

mkoeppe commented 3 years ago

comment:13

For intersection, can you propose a function for rtree that would help?

DRKWang commented 3 years ago

comment:14

Replying to @mkoeppe:

For intersection, can you propose a function for rtree that would help?

Yes, I think rtree.count() also can help with that when there is a non-intersection between RealSet A and RealSet B, And I have implemented the code.

DRKWang commented 3 years ago

comment:15

I am thinking about changing the internal code of function RealSet.union(). Instead of ordinary function union(A,B), which has two equally operands, the function RealSet.union() has already self, and this format kind of indicates the self set is dominating and the other set will be added to self. In this case, I feel when we are trying to extend this function onto the rtree_bssed_Realset class, it is better to return a rtree_based_RealSet subclass instead of a RealSet class, since the programmer kind of wants to maintain the internal rtree data structure. And since this function only covers the + operation, which correspondingly is also easy to implement with rtree, thus, I feel I can try to achieve that.

mkoeppe commented 3 years ago

comment:16

Replying to @DRKWang:

I feel when we are trying to extend this function onto the rtree_bssed_Realset class, it is better to return a rtree_based_RealSet subclass instead of a RealSet class,

Yes, I agree. This is a general change that should be made throughout the RealSet class. For example, in RealSet.complement, it should probably use self.__class__ instead of RealSet when it constructs the result.

DRKWang commented 3 years ago

comment:17

Yes, I agree. This is a general change that should be made throughout the RealSet class. For example, in RealSet.complement, it should probably use self.__class__ instead of RealSet when it constructs the result.

I see. I have made the changes inside the RealSet class by replacing RealSet with self.__class__. And I feel those replacements only need to happen under 2 following conditions,

It was after the return. Therefore, the function will finally return the corresponding class, and apart from return, it is necessary to make a replacement.
We can not make those replacements under the function that has a @staticmethod decorator. Since function with @staticmethod does not have self parameter, which means we can not use self.__class__ to help.

Specifically, the functions I made the change are union(), intersection(), complement(), __mul__().

mkoeppe commented 3 years ago

comment:18

Replying to @DRKWang:

We can not make those replacements under the function that has a @staticmethod decorator. Since function with @staticmethod does not have self parameter, which means we can not use self.__class__ to help.

Try changing these methods to @classmethod - https://docs.python.org/3/library/functions.html#classmethod

And push the branch please to the ticket

DRKWang commented 3 years ago

Commit: 89595a6

DRKWang commented 3 years ago

Branch: public/32170

DRKWang commented 3 years ago

New commits:

`fa4e4ab`	`build/pkgs/cgal: New`
`30b487f`	`build/pkgs/swig:New`
`df9069a`	`build/pkgs/cgal_swig_bindings: New`
`82bf167`	`add the upstream_url to checksums`
`40b8a43`	`Merge tag '9.3.beta6' into t/31098/cgal_swig_bindings`
`6be25f7`	`build/pkgs/cgal_swig_bindings/checksums.ini: Fix upstream_url`
`5695105`	`build/pkgs/cgal_swig_bindings: Use unpatched cgal-swig-bindings, set Python_EXECUTABLE`
`e380217`	`merge th current version to public/32000`
`89595a6`	`make a whole change for replacing RealSet() objectas a return with 'self.__class__()' or 'cls()' as a return for 'RealSet' class; add 'Rtree_based_RealSet' class for rtree extension.`

DRKWang commented 3 years ago

comment:20

Try changing these methods to @classmethod - https://docs.python.org/3/library/functions.html#classmethod

Got it. I have made the change inside the RealSet object for placing the @staticmethod with @classmethod and use cls() as a wrapper for returning.

And push the branch please to the ticket

The above changes, including whole changes for RealSet, Rtree_based_RealSet class, have been made inside the sage/src/sage/sets/real_set.py file. And the code have been submitted to the branch to the ticket.

DRKWang commented 3 years ago

comment:21

Since I also want to take a test to check whether it performs correctly and since it has been changed under the src folder, for which I thought we can use it explicitly without importing any more. However, sage IDE shows that name 'Rtree_based_RealSet' is not defined.

DRKWang commented 3 years ago

comment:22

Also, I always have a question on how to pushing the latest changes correctly for the sagemath. Assuming there are no previous branches was uploaded for a ticket, If I want to upload a new initial branch for it, either should I merge to the latest version, for example, currently it is sage-9.5, before uploading, or is it ok to keep an old version only if we can run the code without errors?

mkoeppe commented 3 years ago

comment:23

Yes, your current branch has a merge conflict with the current develop, 9.5.beta0, so it is best to merge it in, resolving the conflicts

mkoeppe commented 3 years ago

comment:24

Actually I see the branch is on top of the cgal swig bindings stuff.

Best to do is the following:

Start a new branch from 9.5.beta0
Merge the branch of #32000
cherry-pick the commit 89595a6 with your realset changes

mkoeppe commented 3 years ago

comment:25

A comment on naming the class: For an implementation variant of a class, we would use an underscore-separated suffix: Instead of Rtree_based_RealSet, use RealSet_rtree

mkoeppe commented 3 years ago

comment:26

Replying to @DRKWang:

Since I also want to take a test to check whether it performs correctly and since it has been changed under the src folder, for which I thought we can use it explicitly without importing any more.

If you built Sage without --enable-editable (https://wiki.sagemath.org/ReleaseTours/sage-9.3#Editable_.28.22in-place.22.2C_.22develop.22.29_installs_of_the_Sage_library), you will have to use ./sage -b to install the changes to the Python sources that you made.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`594bb26`	`make a whole change for RealSet class with using 'self.__class__' and '@classmethod', as well as adding a new subclass RealSet_rtree.`
`2330ca9`	`add new package rtree and libspatialindex`
`5902e3c`	`build/pkgs/libspatialindex: Make it a normal package`
`791aafd`	`update SPKG.rst, correct distros, add url in checksums.ini, revise spkg-configure`
`68735cb`	`temporarily rename the distributions files under package rtree and libspatialindex with adding prefix_ for later renaming those files with all lowercase, this is caused by git did not make case-sensitive.`
`720ff79`	`Rename the distribution files under package rtree and libspatialindex with lowercases.`
`dfc5410`	`Merge branch 'public/32000' of trac.sagemath.org:sage into realset_rtree`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 89595a6 to dfc5410

DRKWang commented 3 years ago

comment:28

Replying to @mkoeppe:

If you built Sage without --enable-editable (https://wiki.sagemath.org/ReleaseTours/sage-9.3#Editable_.28.22in-place.22.2C_.22develop.22.29_installs_of_the_Sage_library), you will have to use ./sage -b to install the changes to the Python sources that you made.

Since software is not built with --enable-editable, I choose to use ./sage -b. However, still, it is not able to import the RealSet_rtree class explicitly.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

674d982 "Fixed 2 bugs:

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from dfc5410 to 674d982

DRKWang commented 3 years ago

comment:30

A few test programs have been made for this class RealSet_rtree(), including case with +-infinity, type returned class test. Those test codes can be found at the end here. https://github.com/DRKWang/Data-structure-for-piecewise-linear-function/blob/main/logsfile/test%20for%20real_set_with_rtree.ipynb

As a result, I found 2 bugs,

one is for misplacing the return self.__class__(*intervals);
the other is that since floor() and ceil() can not deal with +-infinity properly, we have to make it separately.

mkoeppe commented 3 years ago

comment:31

Replying to @DRKWang:

As a result, I found 2 bugs,

one is for misplacing the return self.__class__(*intervals);

the other is that since floor() and ceil() can not deal with +-infinity properly, we have to make it separately.

Best practice is to add a test to the doctests that demonstrate that it now works correctly.

DRKWang commented 3 years ago

comment:32

Replying to @mkoeppe:

Replying to @DRKWang:

As a result, I found 2 bugs,

one is for misplacing the return self.__class__(*intervals);

the other is that since floor() and ceil() can not deal with +-infinity properly, we have to make it separately.

Best practice is to add a test to the doctests that demonstrate that it now works correctly.

Following this instruction https://doc.sagemath.org/html/en/developer/doctesting.html#beyond-the-sage-library, I tried to use ./sage -t src/sage/sets/real_set.pyfor docstring. However, All the testing results are failures. There are 2 types of arisen errors, one failure comes from SyntaxError: multiple statements found while compiling a single statement, the other comes from NameError: name'RealSet_rtree' is not defined. For the first, I guess it is caused by format error, something like an invisible return, but I have no idea about the second one. The log file is here. https://github.com/DRKWang/Data-structure-for-piecewise-linear-function/blob/main/logsfile/failure%20info.txt

mkoeppe commented 3 years ago

comment:33

The current branch defines class RealSet_rtree 3 times -- that's probably the first thing that should be cleaned up

mkoeppe commented 3 years ago

comment:34

I tried the first example from your worksheet, after importing the class:

sage: from sage.sets.real_set import RealSet_rtree

sage: A = RealSet_rtree(0,1) 
....: B = RealSet_rtree(0.5,3) 
....: C = A.intersection(B) 
....: type(C)

I get this error:

~/s/sage/sage-rebasing/local/lib/python3.9/site-packages/sage/sets/real_set.py in __init__(self, *intervals)
   2707             if (x_max < lower) and (lower != +Infinity):
   2708                 x_max = lower
-> 2709             if (x_max < upper) and (upper != +Infinity):
   2710                 x_max = upper
   2711         if x_max == 0: #rtree is not needed, since they are only +-infinity and zero!

NameError: name 'Infinity' is not defined

This means that you need to add some import statements to the file. You can type import_statements('Infinity') to help with this.

The doctests of the new class RealSet_rtree are copy-paste from RealSet and do not test the new class. Best to replace them by the tests that you have in your worksheet.

DRKWang commented 3 years ago

comment:35

Replying to @mkoeppe:

The current branch defines class RealSet_rtree 3 times -- that's probably the first thing that should be cleaned up

So desperate! It is really a fatal and long-history mistake, even existing for 3 commits. I guess it is caused by using Jupiter notebook a lot. I will be cautious next time.

mkoeppe commented 3 years ago

comment:37

Don't worry about it. It's easy to clean up the commit history later

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 674d982 to 8c1f0c1

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`b5384ec`	`Deleted redundant RealSet_rtree classes`
`8c1f0c1`	`Add doctests for class RealSet_rtree()`

DRKWang commented 3 years ago

comment:39

Replying to @mkoeppe:

you will have to use ./sage -b to install the changes to the Python sources that you made.

I tried to use ./sage -b again after the meeting this morning, the real_set.py file under the directory of SAGE_LOCAL/lib/python3.7/site-packages/sage/sets did be rebuilt. But, Sage still shows NameError: name 'RealSet_rtree' is not defined. Did I miss any other commands?

mkoeppe commented 3 years ago

comment:40

You need to import it before using, see comment:34

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Changed commit from 8c1f0c1 to d160842

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 3 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

d160842 Doctests for class RealSet_rtree() have been adjusted and passed the test.

DRKWang commented 3 years ago

comment:42

Replying to @mkoeppe:

The doctests of the new class RealSet_rtree are copy-paste from RealSet and do not test the new class. Best to replace them by the tests that you have in your worksheet.

I have updated the doctests of the new class RealSet_rtree. And those doctests have been passed successfully. However, according to this documentation, https://doc.sagemath.org/pdf/en/reference/doctest/doctest.pdf, "Sage currently accepts backslashes as indicating that the end of the current line should be joined to the next line. This feature allows for breaking large integers over multiple lines but is not standard for Python doctesting.", which infers the sage doctest can only deal with single line, except the cases with for, while and ending with :. Therefore, I have to use ; to make it into one line, which is not beautiful and clear.

DRKWang commented 3 years ago

comment:43

In order to make the projecting directions more separable with respect to cones of a given fan, we could generate certain directions based on the inclination of cones of a given fan instead of randomly.

Suppose the dimension of universal space is $n$. Concretely,

We will compute the inclination first. We measure this inclination of the fan with "average point" $q$. The average point is defined by $q = \sum p_i$, ${p_i}$ are the points along with the generating rays of the fan and whose 1-norm is exactly 1.
Then, choose a set of orthogonal basis ${v_j}$ coming from the complementary space of the space generated by $q$, i.e. $\forall i$, $v_j \perp q$. Note that, since the average point $q$ could be zero vector, in that case, we will choose $N(1,0,...0)$, $N(0,1,..., 0)$,..., $N(0,0,...,1)$ as the basis. The chosen set of orthogonal basis ${v_j}$ could be good candidates for seperating the cones of fan. One way of generating the orthogonal basis ${v_j}$ will be using Gram–Schmidt process.
After generating an orthogonal basis ${v_j}$, the basis will be stored in the class RationalPolyhedralFan as attributes of the class. and those attributes will be invoked for generating new boxes for function _contains().

DRKWang commented 3 years ago

comment:44

Replying to @DRKWang:

In order to make the projecting directions more separable with respect to cones of a given fan, we could generate certain directions based on the inclination of cones of a given fan instead of randomly.

Suppose the dimension of universal space is $n$. Concretely,

We will compute the inclination first. We measure this inclination of the fan with "average point" $q$. The average point is defined by $q = \sum p_i$, ${p_i}$ are the points along with the generating rays of the fan and whose 1-norm is exactly 1.

Then, choose a set of orthogonal basis ${v_j}$ coming from the complementary space of the space generated by $q$, i.e. $\forall i$, $v_j \perp q$. Note that, since the average point $q$ could be zero vector, in that case, we will choose $N(1,0,...0)$, $N(0,1,..., 0)$,..., $N(0,0,...,1)$ as the basis. The chosen set of orthogonal basis ${v_j}$ could be good candidates for seperating the cones of fan. One way of generating the orthogonal basis ${v_j}$ will be using Gram–Schmidt process.

After generating an orthogonal basis ${v_j}$, the basis will be stored in the class RationalPolyhedralFan as attributes of the class. and those attributes will be invoked for generating new boxes for function _contains().

I have no idea that it does not support the Latex display here. For convenience, you may check the Latex display from the section "Generating projecting directions based on the inclination of cones of a given fan" https://github.com/DRKWang/cutgeneratingfunctions_new/blob/develop/log44.ipynb.

mkoeppe commented 3 years ago

comment:45

Another way to select a set of projection directions:

A subset of the set of normal vectors of all facets of all cones.
Perhaps there is a way to measure how "separating" one of these candidate normal vectors is?

DRKWang commented 3 years ago

comment:46

Replying to @mkoeppe:

Another way to select a set of projection directions:

A subset of the set of normal vectors of all facets of all cones.

Perhaps there is a way to measure how "separating" one of these candidate normal vectors is?

I agree that "the normal vectors of all facets of all cones could also be good candidates for separating these cones". In some sense, those facets can be basically categorized into 2 types, either a facet including an intersection of two maximal cones, or just a pure facet without any intersection. Comparing those two types with each other, the first one seems to be more important and "separating", since the first one focuses on neighbor separating, whereas the second one does not show any information. Thus, we could generate some projecting directions based on the normal vectors of some facets of cones.

Generally, the facets including an intersection can still be categorized into more subtypes based on what is the dimension of the intersection. And heuristically, the higher dimension of the intersection is, the more efficient this normal vector is. But, due to the complexity problem, I feel it is not necessary to categorize those facets at that level.

Even though the above method seems to be efficient, it could be weak when it comes to the case where all cones of a given do not have any intersection except the original points. In that case, there is no "first type of facets" at all. Thus, if that happens, we may need other ways to generate those directions. A good way is to use the direction coming from 2 center points of the tokens of two neighboring cones as supplied directions.

sagemath / sage

Containers for families of geometric objects #32170