Open mkoeppe opened 3 years ago
The precision of interval values in Package libspatialindex
is double
.
They did not claim it explicitly, but I found it from
https://rtree.readthedocs.io/en/latest/class.html#rtree.index.Index
https://github.com/libspatialindex/libspatialindex/blob/master/include/spatialindex/capi/DataStream.h
.OK, the next step would be to find out whether the library does any arithmetic with these double
values (in this case we would have to check whether they apply directed rounding correctly), or whether they only use comparisons on these double
values.
Description changed:
---
+++
@@ -1,4 +1,4 @@
-We define APIs for static and dynamic containers storing finite families of geometric objects.
+We define APIs for static and dynamic containers storing finite families of geometric objects, extending `Container`, `Set`, `MutableSet`, `Mapping`, `MutableMapping` https://docs.python.org/3/library/collections.abc.html
The family must define a collection of "measurement maps" `B_i`, each sending an object to an `InternalRealInterval`.
The library libspatialindex
uses arithmetic operations, including +
, -
and *
, with those double
values in order to support some functions, such as
LineSegment::Collinear
( https://libspatialindex.org/en/latest/doxygen/LineSegment_8cc_source.html#l00439),
LineSegment::getArea
(https://github.com/libspatialindex/libspatialindex/blob/master/src/spatialindex/Region.cc)
and etc.
Besides, the library libspatialindex
also uses std::numeric_limits<double>::epsilon()
to deal with machine precision issue, such as Region::touchesPoint(const Point& p)
(https://github.com/libspatialindex/libspatialindex/blob/master/src/spatialindex/Region.cc)
std::numeric_limits<double>::max();
(https://github.com/libspatialindex/libspatialindex/blob/master/src/spatialindex/Point.cc).
A few test codes have been made for testing arithmetic precision. The conclusions are as follows, more details and codes can be found from this(https://github.com/DRKWang/Data-structure-for-piecewise-linear-function/blob/main/log35.ipynb).
The dimension of the index database should always be greater than 1. (0 or 1 dimension will fail).
The arithmetic precision is 16 decimal digits, which comes from the double type's precision.
The minimal capturing of arithmetic precision is 324 decimal digits, which means the minimal difference to distinguish two numbers is 1*10^{-324}. otherwise, they will be viewed as the same number.
The intersection for retrieving is a closed intersection. (-0.000) is also considered as the same number as 0, but both representations will be displayed at the same time.
Rounding. The round processing is a little complex and untraceable, it is neither rounding up or rounding down.
I have made some debugs for this library. I would like to list the key points below. The entire log file can be found here https://github.com/DRKWang/Data-structure-for-piecewise-linear-function/blob/main/logsfile/log36.ipynb.
The python wrapper part for rtree was only used for passing the values and data to 2 basic functions, core.rt.Index_InsertData()
and core.rt.Index_Intersects_obj()
. Those two functions are stored here https://github.com/Toblerity/rtree/blob/master/rtree/core.py. Thus, there is no arithmetic error in this part.
The corresponding part in libspatialindex for core.rt.Index_InsertData()
and core.rt.Index_Intersects_obj()
are placed in line 499 and line 678 in https://github.com/libspatialindex/libspatialindex/blob/master/src/capi/sidx_api.cc .
In the Index_InsertData()
in C++, the author uses the concepts "points" and "boxes" to refine the input information, and then call the function index().insertData(**)
to compute. As we discussed last time, the author mistakenly uses the std::numeric_limits<double>::epsilon()
as the range of approximation, that is, if the bound difference is within this range, then they will be seen as a single point instead of a box. According to this note, https://stackoverflow.com/questions/48133572/what-can-stdnumeric-limitsdoubleepsilon-be-used-for, we may correct it by multiplying the magnitudes of the comparison. However, because it is for scientific computation, we should get rid of approximation as much as possible.
Except for the above place, the author also mistakenly has used std::numeric_limits<double>::epsilon()
in many other places. I think we could inform the author to check them deeply.
The extensions, .idx
and .dat
, are two files used to serialize the rtree to disk, according to https://rtree.readthedocs.io/en/latest/tutorial.html.
Those two files are created by #include <fstream>
internally, based on this source code, https://github.com/libspatialindex/libspatialindex/blob/master/src/storagemanager/DiskStorageManager.cc.
The .idx
is used to store vital information like the page size, the next available page, a list of empty pages, and the sequence of pages associated with every entity ID.There is no note for the detail or purpose of .dat
file. Thus, I made a test where putting 50 boxes into this library. The shows that the .dat
is way larger than .idx
, which indicates the .dat
is used to store the real data, whereas .idx
is used for storage the rest of it.
The correct way to close and reopen those serialized files can be found here, https://gis.stackexchange.com/questions/254781/saving-python-rtree-spatial-index-to-file
Except for DiskStorageManager
, it also has a MemoryStorageManager
. The detail of it is that "Everything is stored in main memory using a simple vector. No properties are needed to initialize a MemoryStorageManager
object. When a MemoryStorageManager
instance goes out of scope, all data that it contains are lost."
The .dat_extension = xxx
and .idx_extension = xxx
can be used to change the names of those extensions.
I am trying to build a subclass called rtree_based_realset()
, which is inherited from the class RealSet()
. But got stuck on the purpose of this subclass. I would like to list my chain of thoughts below and hope to get a verification.
The intention of building this subclass is to keep the interface of this class, (I feel basically the interface can be considered as functions), the same and use rtree
data structure to be a preliminary filter for overestimating in order to reduce the computation time. But since rtree only serves as an overestimate from one side, in order to be accurate, this subclass, which we want to define, still needs the finite unit intervals
structure inside. In that case, seemingly, the subclass does not have a big difference from the RealSet()
class. And the difference between those two classes only comes from whether we use rtree
as an auxiliary tools or not.
Replying to @DRKWang:
since rtree only serves as an overestimate from one side, in order to be accurate, this subclass, which we want to define, still needs the
finite unit intervals
structure inside.
That's right, we are not replacing the internal representation of RealSet
instances.
We are only adding an index data structure to it.
Since the rtree.count()
, which will return the number of objects whose boxes have intersections with the target point, is not exactly suitable for our expected function, I try to figure out some other functions to replace it. In my opinion, the best function would be rtree.getfirst()
, if we should have, which ideally can assist with both RealSet.contains()
and RealSet.interesction()
. However, I did not get such a function, instead, I found rtree.nearest()
, which will return the item whose box has minimal distance to the target point. However, after comparing rtree.nearest()
with rtree.count()
, the test results shows the rtree.nearest()
is slower than rtree.count()
.
As the boxes overestimate the true objects, a count
result of 0 is useful.
For intersection
, can you propose a function for rtree
that would help?
Replying to @mkoeppe:
For
intersection
, can you propose a function forrtree
that would help?
Yes, I think rtree.count()
also can help with that when there is a non-intersection between RealSet A and RealSet B, And I have implemented the code.
I am thinking about changing the internal code of function RealSet.union()
. Instead of ordinary function union(A,B)
, which has two equally operands, the function RealSet.union()
has already self
, and this format kind of indicates the self
set is dominating and the other set will be added to self
. In this case, I feel when we are trying to extend this function onto the rtree_bssed_Realset
class, it is better to return a rtree_based_RealSet
subclass instead of a RealSet
class, since the programmer kind of wants to maintain the internal rtree
data structure. And since this function only covers the +
operation, which correspondingly is also easy to implement with rtree
, thus, I feel I can try to achieve that.
Replying to @DRKWang:
I feel when we are trying to extend this function onto the
rtree_bssed_Realset
class, it is better to return artree_based_RealSet
subclass instead of aRealSet
class,
Yes, I agree. This is a general change that should be made throughout the RealSet
class. For example, in RealSet.complement
, it should probably use self.__class__
instead of RealSet
when it constructs the result.
Yes, I agree. This is a general change that should be made throughout the
RealSet
class. For example, inRealSet.complement
, it should probably useself.__class__
instead ofRealSet
when it constructs the result.
I see. I have made the changes inside the RealSet
class by replacing RealSet
with self.__class__
. And I feel those replacements only need to happen under 2 following conditions,
return
. Therefore, the function will finally return
the corresponding class, and apart from return
, it is necessary to make a replacement.@staticmethod
decorator. Since function with @staticmethod
does not have self
parameter, which means we can not use self.__class__
to help.Specifically, the functions I made the change are union()
, intersection()
, complement()
, __mul__()
.
Replying to @DRKWang:
- We can not make those replacements under the function that has a
@staticmethod
decorator. Since function with@staticmethod
does not haveself
parameter, which means we can not useself.__class__
to help.
Try changing these methods to @
classmethod - https://docs.python.org/3/library/functions.html#classmethod
And push the branch please to the ticket
Branch: public/32170
New commits:
fa4e4ab | build/pkgs/cgal: New |
30b487f | build/pkgs/swig:New |
df9069a | build/pkgs/cgal_swig_bindings: New |
82bf167 | add the upstream_url to checksums |
40b8a43 | Merge tag '9.3.beta6' into t/31098/cgal_swig_bindings |
6be25f7 | build/pkgs/cgal_swig_bindings/checksums.ini: Fix upstream_url |
5695105 | build/pkgs/cgal_swig_bindings: Use unpatched cgal-swig-bindings, set Python_EXECUTABLE |
e380217 | merge th current version to public/32000 |
89595a6 | make a whole change for replacing RealSet() objectas a return with 'self.__class__()' or 'cls()' as a return for 'RealSet' class; add 'Rtree_based_RealSet' class for rtree extension. |
Try changing these methods to
@
classmethod - https://docs.python.org/3/library/functions.html#classmethod
Got it. I have made the change inside the RealSet
object for placing the @staticmethod
with @classmethod
and use cls()
as a wrapper for returning.
And push the branch please to the ticket
The above changes, including whole changes for RealSet
, Rtree_based_RealSet
class, have been made inside the sage/src/sage/sets/real_set.py
file. And the code have been submitted to the branch to the ticket.
Since I also want to take a test to check whether it performs correctly and since it has been changed under the src
folder, for which I thought we can use it explicitly without importing any more. However, sage
IDE shows that name 'Rtree_based_RealSet' is not defined
.
Also, I always have a question on how to pushing the latest changes correctly for the sagemath. Assuming there are no previous branches was uploaded for a ticket, If I want to upload a new initial branch for it, either should I merge to the latest version, for example, currently it is sage-9.5, before uploading, or is it ok to keep an old version only if we can run the code without errors?
Yes, your current branch has a merge conflict with the current develop
, 9.5.beta0, so it is best to merge it in, resolving the conflicts
Actually I see the branch is on top of the cgal swig bindings stuff.
Best to do is the following:
A comment on naming the class: For an implementation variant of a class, we would use an underscore-separated suffix: Instead of Rtree_based_RealSet
, use RealSet_rtree
Replying to @DRKWang:
Since I also want to take a test to check whether it performs correctly and since it has been changed under the
src
folder, for which I thought we can use it explicitly without importing any more.
If you built Sage without --enable-editable
(https://wiki.sagemath.org/ReleaseTours/sage-9.3#Editable_.28.22in-place.22.2C_.22develop.22.29_installs_of_the_Sage_library), you will have to use ./sage -b
to install the changes to the Python sources that you made.
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
594bb26 | make a whole change for RealSet class with using 'self.__class__' and '@classmethod', as well as adding a new subclass RealSet_rtree. |
2330ca9 | add new package rtree and libspatialindex |
5902e3c | build/pkgs/libspatialindex: Make it a normal package |
791aafd | update SPKG.rst, correct distros, add url in checksums.ini, revise spkg-configure |
68735cb | temporarily rename the distributions files under package rtree and libspatialindex with adding prefix_ for later renaming those files with all lowercase, this is caused by git did not make case-sensitive. |
720ff79 | Rename the distribution files under package rtree and libspatialindex with lowercases. |
dfc5410 | Merge branch 'public/32000' of trac.sagemath.org:sage into realset_rtree |
Replying to @mkoeppe:
If you built Sage without
--enable-editable
(https://wiki.sagemath.org/ReleaseTours/sage-9.3#Editable_.28.22in-place.22.2C_.22develop.22.29_installs_of_the_Sage_library), you will have to use./sage -b
to install the changes to the Python sources that you made.
Since software is not built with --enable-editable
, I choose to use ./sage -b
. However, still, it is not able to import the RealSet_rtree
class explicitly.
Branch pushed to git repo; I updated commit sha1. New commits:
674d982 | "Fixed 2 bugs: |
A few test programs have been made for this class RealSet_rtree()
, including case with +-infinity
, type returned class test
. Those test codes can be found at the end here. https://github.com/DRKWang/Data-structure-for-piecewise-linear-function/blob/main/logsfile/test%20for%20real_set_with_rtree.ipynb
As a result, I found 2 bugs,
return self.__class__(*intervals)
;floor()
and ceil()
can not deal with +-infinity
properly, we have to make it separately.Replying to @DRKWang:
As a result, I found 2 bugs,
- one is for misplacing the
return self.__class__(*intervals)
;- the other is that since
floor()
andceil()
can not deal with+-infinity
properly, we have to make it separately.
Best practice is to add a test to the doctests that demonstrate that it now works correctly.
Replying to @mkoeppe:
Replying to @DRKWang:
As a result, I found 2 bugs,
- one is for misplacing the
return self.__class__(*intervals)
;- the other is that since
floor()
andceil()
can not deal with+-infinity
properly, we have to make it separately.Best practice is to add a test to the doctests that demonstrate that it now works correctly.
Following this instruction https://doc.sagemath.org/html/en/developer/doctesting.html#beyond-the-sage-library, I tried to use ./sage -t src/sage/sets/real_set.py
for docstring. However, All the testing results are failures. There are 2 types of arisen errors, one failure comes from SyntaxError: multiple statements found while compiling a single statement
, the other comes from NameError: name'RealSet_rtree' is not defined
. For the first, I guess it is caused by format error, something like an invisible return, but I have no idea about the second one. The log file is here. https://github.com/DRKWang/Data-structure-for-piecewise-linear-function/blob/main/logsfile/failure%20info.txt
The current branch defines class RealSet_rtree
3 times -- that's probably the first thing that should be cleaned up
I tried the first example from your worksheet, after importing the class:
sage: from sage.sets.real_set import RealSet_rtree
sage: A = RealSet_rtree(0,1)
....: B = RealSet_rtree(0.5,3)
....: C = A.intersection(B)
....: type(C)
I get this error:
~/s/sage/sage-rebasing/local/lib/python3.9/site-packages/sage/sets/real_set.py in __init__(self, *intervals)
2707 if (x_max < lower) and (lower != +Infinity):
2708 x_max = lower
-> 2709 if (x_max < upper) and (upper != +Infinity):
2710 x_max = upper
2711 if x_max == 0: #rtree is not needed, since they are only +-infinity and zero!
NameError: name 'Infinity' is not defined
This means that you need to add some import statements to the file.
You can type import_statements('Infinity')
to help with this.
The doctests of the new class RealSet_rtree
are copy-paste from RealSet
and do not test the new class. Best to replace them by the tests that you have in your worksheet.
Replying to @mkoeppe:
The current branch defines
class RealSet_rtree
3 times -- that's probably the first thing that should be cleaned up
So desperate! It is really a fatal and long-history mistake, even existing for 3 commits. I guess it is caused by using Jupiter notebook a lot. I will be cautious next time.
Don't worry about it. It's easy to clean up the commit history later
Replying to @mkoeppe:
you will have to use
./sage -b
to install the changes to the Python sources that you made.
I tried to use ./sage -b
again after the meeting this morning, the real_set.py
file under the directory of SAGE_LOCAL/lib/python3.7/site-packages/sage/sets
did be rebuilt. But, Sage still shows NameError: name 'RealSet_rtree' is not defined
. Did I miss any other commands?
You need to import it before using, see comment:34
Branch pushed to git repo; I updated commit sha1. New commits:
d160842 | Doctests for class RealSet_rtree() have been adjusted and passed the test. |
Replying to @mkoeppe:
The doctests of the new class
RealSet_rtree
are copy-paste fromRealSet
and do not test the new class. Best to replace them by the tests that you have in your worksheet.
I have updated the doctests of the new class RealSet_rtree
. And those doctests have been passed successfully. However, according to this documentation, https://doc.sagemath.org/pdf/en/reference/doctest/doctest.pdf, "Sage currently accepts backslashes as indicating that the end of the current line should be joined to the
next line. This feature allows for breaking large integers over multiple lines but is not standard for Python doctesting.", which infers the sage doctest can only deal with single line, except the cases with for
, while
and ending with :
. Therefore, I have to use ;
to make it into one line, which is not beautiful and clear.
In order to make the projecting directions more separable with respect to cones of a given fan, we could generate certain directions based on the inclination of cones of a given fan instead of randomly.
Suppose the dimension of universal space is $n$. Concretely,
We will compute the inclination first. We measure this inclination of the fan with "average point" $q$. The average point is defined by $q = \sum p_i$, ${p_i}$ are the points along with the generating rays of the fan and whose 1-norm is exactly 1.
Then, choose a set of orthogonal basis ${v_j}$ coming from the complementary space of the space generated by $q$, i.e. $\forall i$, $v_j \perp q$. Note that, since the average point $q$ could be zero vector, in that case, we will choose $N(1,0,...0)$, $N(0,1,..., 0)$,..., $N(0,0,...,1)$ as the basis. The chosen set of orthogonal basis ${v_j}$ could be good candidates for seperating the cones of fan. One way of generating the orthogonal basis ${v_j}$ will be using Gram–Schmidt process.
After generating an orthogonal basis ${v_j}$, the basis will be stored in the class RationalPolyhedralFan
as attributes of the class. and those attributes will be invoked for generating new boxes for function _contains()
.
Replying to @DRKWang:
In order to make the projecting directions more separable with respect to cones of a given fan, we could generate certain directions based on the inclination of cones of a given fan instead of randomly.
Suppose the dimension of universal space is $n$. Concretely,
We will compute the inclination first. We measure this inclination of the fan with "average point" $q$. The average point is defined by $q = \sum p_i$, ${p_i}$ are the points along with the generating rays of the fan and whose 1-norm is exactly 1.
Then, choose a set of orthogonal basis ${v_j}$ coming from the complementary space of the space generated by $q$, i.e. $\forall i$, $v_j \perp q$. Note that, since the average point $q$ could be zero vector, in that case, we will choose $N(1,0,...0)$, $N(0,1,..., 0)$,..., $N(0,0,...,1)$ as the basis. The chosen set of orthogonal basis ${v_j}$ could be good candidates for seperating the cones of fan. One way of generating the orthogonal basis ${v_j}$ will be using Gram–Schmidt process.
After generating an orthogonal basis ${v_j}$, the basis will be stored in the class
RationalPolyhedralFan
as attributes of the class. and those attributes will be invoked for generating new boxes for function_contains()
.
I have no idea that it does not support the Latex display here. For convenience, you may check the Latex display from the section "Generating projecting directions based on the inclination of cones of a given fan" https://github.com/DRKWang/cutgeneratingfunctions_new/blob/develop/log44.ipynb.
Another way to select a set of projection directions:
Replying to @mkoeppe:
Another way to select a set of projection directions:
- A subset of the set of normal vectors of all facets of all cones.
- Perhaps there is a way to measure how "separating" one of these candidate normal vectors is?
I agree that "the normal vectors of all facets of all cones could also be good candidates for separating these cones". In some sense, those facets can be basically categorized into 2 types, either a facet including an intersection of two maximal cones, or just a pure facet without any intersection. Comparing those two types with each other, the first one seems to be more important and "separating", since the first one focuses on neighbor separating, whereas the second one does not show any information. Thus, we could generate some projecting directions based on the normal vectors of some facets of cones.
Generally, the facets including an intersection can still be categorized into more subtypes based on what is the dimension of the intersection. And heuristically, the higher dimension of the intersection is, the more efficient this normal vector is. But, due to the complexity problem, I feel it is not necessary to categorize those facets at that level.
Even though the above method seems to be efficient, it could be weak when it comes to the case where all cones of a given do not have any intersection except the original points. In that case, there is no "first type of facets" at all. Thus, if that happens, we may need other ways to generate those directions. A good way is to use the direction coming from 2 center points of the tokens of two neighboring cones as supplied directions.
We define APIs for static and dynamic containers storing finite families of geometric objects, extending
Container
,Set
,MutableSet
,Mapping
,MutableMapping
https://docs.python.org/3/library/collections.abc.htmlThe family must define a collection of "measurement maps"
B_i
, each sending an object to anInternalRealInterval
.The product of these intervals can be thought of as a "generalized bounding box".
Ideally the maps would be inclusion-preserving.
The implementation is provided by
rtree
/libspatialindex
(#32000).The Sage-specific class will take care of:
InternalRealInterval
by a rescaling or inclusion-preserving overestimation, making the coordinates suitable for the underlying library.intersection
when the result will be empty,__contains__
when the result will beFalse
.Geometric lookup operations to be supported:
Applications:
RationalPolyhedralFan
PolyhedralComplex
(#31748)B_i
comes from the coordinates of covering charts, using(-oo, oo)
for coordinates for which no coordinate definition for the subset is knownDepends on #34277
CC: @DRKWang
Component: geometry
Work Issues: Rebase on #34277
Author: Binshuai Wang
Branch/Commit: public/32170 @
05ea99d
Issue created by migration from https://trac.sagemath.org/ticket/32170