Open msalazarcgeo opened 3 years ago
Hi @msalazarcgeo , thanks for the issue. Any chance you can provide code to reproduce?
I have made a jupyter-lab notebook reproducing the behaviour, I don't know if this is is good for you or you want me to create a pure python script reproducing.
Here is a code that shows the main problem (the quadtree is not constant on the same set of points )
import numpy as np
import cudf
import cuspatial
import shapely
import cuml
import geopandas
import pandas as pd
### Read a poligon
poligon= pd.read_csv('./poligon_cdmx.csv')
poligon.drop(columns= 'Unnamed: 0', inplace = True)
#The bounding box of the poligon previusly calculated
bb_coords = (-99.36492420394829, 19.048236663823484, -98.94030281205599, 19.59275727996437)
### Sent the poligon to the GPU
poly_tuple= (cudf.Series(data = np.array([0]), dtype = np.int32),
cudf.Series(data = np.array([0]), dtype = np.int32),
cudf.DataFrame(poligon)
)
#### Random points
rand_x = np.random.random(600000) * (bb_coords[0]- bb_coords[2]) + bb_coords[2]
rand_y = np.random.random(600000) * (bb_coords[1]- bb_coords[3]) + bb_coords[3]
#### Dataframe with the points on the GPU
df_cu2 = cudf.DataFrame({'X': rand_x, 'Y': rand_y})
#### Get only the point in the polygon
cu_res2 = cuspatial.point_in_polygon(
df_cu2['X'],
df_cu2['Y'],
poly_tuple[0],
poly_tuple[1],
poly_tuple[2]['x'],
poly_tuple[2]['y']
)
### Only the point in the poligon
df_cu3 = df_cu2[cu_res2[0]]
####
print("Number of points to do the quadtree: ", df_cu3.shape[0])
### to get the quantree
x_size = bb_coords[2] - bb_coords[0]
y_size = bb_coords[3]- bb_coords[1]
max_depth = 8
scale_p = max(x_size, y_size) / (1 << max_depth )
min_size = 10 ### to determin if the quea is a leaf
print("Generate first quadtree")
key_1 , quad_1 = cuspatial.quadtree_on_points(
df_cu3['X'],
df_cu3['Y'],
bb_coords[0],
bb_coords[2],
bb_coords[1],
bb_coords[3],
scale = scale_p,
max_depth = max_depth,
min_size = min_size
)
print("The number of point in the leaves of the first quadtree: ", quad_1[quad_1['is_quad']== False]['length'].sum())
print("Redo the quadtree on the same set of points")
key_2 , quad_2 = cuspatial.quadtree_on_points(
df_cu3['X'],
df_cu3['Y'],
bb_coords[0],
bb_coords[2],
bb_coords[1],
bb_coords[3],
scale = scale_p,
max_depth = max_depth,
min_size = min_size
)
print("The number of point in the leaves of the second quadtree: ",quad_2[quad_2['is_quad']== False]['length'].sum())
####
print("Redo the quadtree on the same set of points using the same variables ")
key_2 , quad_2 = cuspatial.quadtree_on_points(
df_cu3['X'],
df_cu3['Y'],
bb_coords[0],
bb_coords[2],
bb_coords[1],
bb_coords[3],
scale = scale_p,
max_depth = max_depth,
min_size = min_size
)
print("The number of point in the leaves of the third quadtree: ",quad_2[quad_2['is_quad']== False]['length'].sum())
Due to the use of the random function the output will be different (I didn't use the "seed" function ) but it show the problem meaning that the first and second quad_2[quad_2['is_quad']== False]['length'].sum()
are different.
My believe is that for some reason the quadtree is not pointing correctly to the memory address of the inner leaves.
@msalazarcgeo Thanks for the bug report. Can you confirm which version you were using? we fixed a related bug for 0.17, which should be used for testing.
@msalazarcgeo Thanks for the bug report. Can you confirm which version you were using? we fixed a related bug for 0.17, which should be used for testing.
@zhangjianting I will check on the 0.17 version to confirm the bug.
@zhangjianting I have tested using a new environment with the 0.17 version. The result was better, meaning that the difference between the number of points in the leaves is closer between the quadtrees. But not the issue is not properly resolved, the number of points in the quadtrees should be the same for each.
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
To create fast rasterization I wanted to use the quadtree_on_points, the data has 480631 points and a max_depht of 8, when I ran the function the resulting quadtree has 7720 points on the leaves, to calculate the number of points in all the leaves I use
quad_tree[quad_tree['is_quad']== False]['length'].sum()
and the number of nodes in the tree is 611. This is wrong, if the quadtree is recalculated then the result is right, meaning that the number of points in the leaves is 480631 and the size of the tree is has changed to what seems to be the proper tree.This is not an issue to be critical or something like that, I can solve my problem by simply recalculating the quadtree. I just want to inform you about the behavior and thank you for the amazing tool that this library is.