Open smathermather opened 4 months ago
Not sure if lazy loading or chunking is easy given we need to sub-sample the data which is subject to edge effect, but I'll build first in a simple fashion such as:
def read_point_cloud(filename, reduce=10)
likely writing to an xarray, and then iterate from there.
Probably need to pre-filter with outlier removal ala: https://pdal.io/en/2.6.3/stages/filters.outlier.html#filters-outlier
It looks like, at least for PDAL pipeline elements that support streaming, lazy access is possible: https://github.com/PDAL/python?tab=readme-ov-file#executing-streamable-pipelines
Adapted from the PDAL readme example, reading directly from a point cloud in WebODM (also would work with a local file):
import pdal
data = "https://crankyserver.com/api/projects/325/tasks/dd01f4b5-5ef0-40d8-ad98-88084dd538b9/download/georeferenced_model.laz"
pipeline = pdal.Reader.las(filename=data).pipeline()
print(pipeline.execute())
arr = pipeline.arrays[0]
reddish = arr[arr["Red"] > 30]
print(len(reddish))
# Now use pdal to clamp points that have reddish 100 <= v < 300
pipeline = pdal.Filter.range(limits="Z[90:130)").pipeline(reddish)
print(pipeline.execute())
clamped = pipeline.arrays[0]
# Write our reddish data to a LAS file and a TileDB array. For TileDB it is
# recommended to use Hilbert ordering by default with geospatial point cloud data,
# which requires specifying a domain extent. This can be determined automatically
# from a stats filter that computes statistics about each dimension (min, max, etc.).
pipeline = pdal.Writer.las(
filename="clamped.las",
offset_x="auto",
offset_y="auto",
offset_z="auto",
scale_x=0.01,
scale_y=0.01,
scale_z=0.01,
).pipeline(clamped)
pipeline |= pdal.Filter.stats() | pdal.Writer.tiledb(array_name="clamped")
print(pipeline.execute())
# Dump the TileDB array schema
import tiledb
with tiledb.open("clamped") as a:
print(a.schema)
Add function signature Support for COPC? https://copc.io/
Data organization of COPC is modeled after the EPT data format, but COPC clusters the storage of the octree as variably-chunked LAZ data in a single file. This allows the data to be consumed sequentially by any reader than can handle variably-chunked LAZ 1.4 (LASzip, for example), or as a spatial subset for readers that interpret the COPC hierarchy. More information about the differences between EPT data and COPC can be found below.
def _get_pc(pcfile, sampleradius = 1, minz = -500, maxz = 9000):
"""
Extract point cloud to XArray while sampling using Poisson dart sampling, and clamping da
ta to min and max range as desired. Will add ability to subset geographically as well.
Parameters
----------
pcfile : filename or string to web address (string)
sampleradius : sample radius for Poisson "Dart Throwing) as implemented in PDAL's filters
.sample
minz : minimum z elevation extracted
maxz : maximum z elevation extracted
Returns
-------
xugrid DataArray (float, holding point cloud data)
"""
The one weird thing is the bounding box is wrong for the data when processed, but it otherwise works in QGIS and CloudCompare.
Possible function signature:
def read_point_cloud(filename, reduce=10, lazy=True)
Thoughts from hessel on possible lazy loading / chunking: https://stackoverflow.com/questions/47671676/reading-laz-to-dask-dataframe-using-delayed-loading