rapidsai / cuspatial

CUDA-accelerated GIS and spatiotemporal algorithms
https://docs.rapids.ai/api/cuspatial/stable/
Apache License 2.0
602 stars 151 forks source link

[BUG] quadtree_on_points not constant behavior #326

Open msalazarcgeo opened 3 years ago

msalazarcgeo commented 3 years ago

To create fast rasterization I wanted to use the quadtree_on_points, the data has 480631 points and a max_depht of 8, when I ran the function the resulting quadtree has 7720 points on the leaves, to calculate the number of points in all the leaves I use quad_tree[quad_tree['is_quad']== False]['length'].sum() and the number of nodes in the tree is 611. This is wrong, if the quadtree is recalculated then the result is right, meaning that the number of points in the leaves is 480631 and the size of the tree is has changed to what seems to be the proper tree.

This is not an issue to be critical or something like that, I can solve my problem by simply recalculating the quadtree. I just want to inform you about the behavior and thank you for the amazing tool that this library is.

harrism commented 3 years ago

Hi @msalazarcgeo , thanks for the issue. Any chance you can provide code to reproduce?

msalazarcgeo commented 3 years ago

I have made a jupyter-lab notebook reproducing the behaviour, I don't know if this is is good for you or you want me to create a pure python script reproducing.

msalazarcgeo commented 3 years ago

Here is a code that shows the main problem (the quadtree is not constant on the same set of points )


import numpy as np
import cudf 
import cuspatial
import shapely
import cuml
import geopandas
import pandas as pd

### Read a poligon 
poligon= pd.read_csv('./poligon_cdmx.csv') 
poligon.drop(columns= 'Unnamed: 0', inplace = True)

#The bounding box of the poligon previusly calculated
bb_coords = (-99.36492420394829, 19.048236663823484, -98.94030281205599, 19.59275727996437)
### Sent the poligon to the GPU 
poly_tuple= (cudf.Series(data = np.array([0]), dtype = np.int32),
             cudf.Series(data = np.array([0]), dtype = np.int32),
             cudf.DataFrame(poligon)
            )

#### Random points 
rand_x = np.random.random(600000) * (bb_coords[0]- bb_coords[2]) + bb_coords[2] 
rand_y = np.random.random(600000) * (bb_coords[1]- bb_coords[3]) + bb_coords[3] 

#### Dataframe with the points on the GPU
df_cu2 = cudf.DataFrame({'X': rand_x, 'Y': rand_y})

#### Get only the point in the polygon 
cu_res2 = cuspatial.point_in_polygon(
                    df_cu2['X'],
                    df_cu2['Y'],
                    poly_tuple[0],
                    poly_tuple[1],
                    poly_tuple[2]['x'],
                    poly_tuple[2]['y']

)
### Only the point in the poligon 
df_cu3 = df_cu2[cu_res2[0]]
#### 
print("Number of points to do the quadtree: ", df_cu3.shape[0])
### to get the quantree
x_size = bb_coords[2] - bb_coords[0]
y_size = bb_coords[3]- bb_coords[1]
max_depth = 8
scale_p = max(x_size, y_size) / (1 << max_depth )
min_size = 10 ### to determin if the quea is a leaf
print("Generate first quadtree")
key_1 , quad_1 = cuspatial.quadtree_on_points(
                        df_cu3['X'],
                        df_cu3['Y'],
                        bb_coords[0],
                        bb_coords[2],
                        bb_coords[1],
                        bb_coords[3],
                        scale = scale_p,
                        max_depth = max_depth,
                        min_size = min_size
)
print("The number of point in the leaves of the first quadtree: ", quad_1[quad_1['is_quad']== False]['length'].sum()) 

print("Redo the quadtree on the same set of points")
key_2 , quad_2 = cuspatial.quadtree_on_points(
                        df_cu3['X'],
                        df_cu3['Y'],
                        bb_coords[0],
                        bb_coords[2],
                        bb_coords[1],
                        bb_coords[3],
                        scale = scale_p,
                        max_depth = max_depth,
                        min_size = min_size
)
print("The number of point in the leaves of the second quadtree: ",quad_2[quad_2['is_quad']== False]['length'].sum())

#### 
print("Redo the quadtree on the same set of points using the same variables ")
key_2 , quad_2 = cuspatial.quadtree_on_points(
                        df_cu3['X'],
                        df_cu3['Y'],
                        bb_coords[0],
                        bb_coords[2],
                        bb_coords[1],
                        bb_coords[3],
                        scale = scale_p,
                        max_depth = max_depth,
                        min_size = min_size
)
print("The number of point in the leaves of the third quadtree: ",quad_2[quad_2['is_quad']== False]['length'].sum())

Due to the use of the random function the output will be different (I didn't use the "seed" function ) but it show the problem meaning that the first and second quad_2[quad_2['is_quad']== False]['length'].sum() are different.

My believe is that for some reason the quadtree is not pointing correctly to the memory address of the inner leaves.

zhangjianting commented 3 years ago

@msalazarcgeo Thanks for the bug report. Can you confirm which version you were using? we fixed a related bug for 0.17, which should be used for testing.

msalazarcgeo commented 3 years ago

@msalazarcgeo Thanks for the bug report. Can you confirm which version you were using? we fixed a related bug for 0.17, which should be used for testing.

@zhangjianting I will check on the 0.17 version to confirm the bug.

msalazarcgeo commented 3 years ago

@zhangjianting I have tested using a new environment with the 0.17 version. The result was better, meaning that the difference between the number of points in the leaves is closer between the quadtrees. But not the issue is not properly resolved, the number of points in the quadtrees should be the same for each.

github-actions[bot] commented 3 years ago

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.