Add point cloud extraction routine

smathermather commented 4 months ago

Possible function signature: def read_point_cloud(filename, reduce=10, lazy=True)

Thoughts from hessel on possible lazy loading / chunking: https://stackoverflow.com/questions/47671676/reading-laz-to-dask-dataframe-using-delayed-loading

smathermather commented 4 months ago

Not sure if lazy loading or chunking is easy given we need to sub-sample the data which is subject to edge effect, but I'll build first in a simple fashion such as: def read_point_cloud(filename, reduce=10)

likely writing to an xarray, and then iterate from there.

smathermather commented 4 months ago

Available filters: https://pdal.io/en/2.6.3/stages/filters.html#resampling

https://pdal.io/en/2.6.3/stages/filters.sample.html https://pdal.io/en/2.6.3/stages/filters.relaxationdartthrowing.html#filters-relaxationdartthrowing

smathermather commented 4 months ago

Probably need to pre-filter with outlier removal ala: https://pdal.io/en/2.6.3/stages/filters.outlier.html#filters-outlier

smathermather commented 4 months ago

It looks like, at least for PDAL pipeline elements that support streaming, lazy access is possible: https://github.com/PDAL/python?tab=readme-ov-file#executing-streamable-pipelines

smathermather commented 4 months ago

Adapted from the PDAL readme example, reading directly from a point cloud in WebODM (also would work with a local file):

import pdal

data = "https://crankyserver.com/api/projects/325/tasks/dd01f4b5-5ef0-40d8-ad98-88084dd538b9/download/georeferenced_model.laz"

pipeline = pdal.Reader.las(filename=data).pipeline()
print(pipeline.execute())

arr = pipeline.arrays[0]

reddish = arr[arr["Red"] > 30]
print(len(reddish))

# Now use pdal to clamp points that have reddish 100 <= v < 300
pipeline = pdal.Filter.range(limits="Z[90:130)").pipeline(reddish)
print(pipeline.execute())
clamped = pipeline.arrays[0]

# Write our reddish data to a LAS file and a TileDB array. For TileDB it is
# recommended to use Hilbert ordering by default with geospatial point cloud data,
# which requires specifying a domain extent. This can be determined automatically
# from a stats filter that computes statistics about each dimension (min, max, etc.).
pipeline = pdal.Writer.las(
    filename="clamped.las",
    offset_x="auto",
    offset_y="auto",
    offset_z="auto",
    scale_x=0.01,
    scale_y=0.01,
    scale_z=0.01,
).pipeline(clamped)
pipeline |= pdal.Filter.stats() | pdal.Writer.tiledb(array_name="clamped")
print(pipeline.execute())

# Dump the TileDB array schema
import tiledb
with tiledb.open("clamped") as a:
    print(a.schema)

smathermather commented 4 months ago

Add function signature Support for COPC? https://copc.io/

Data organization of COPC is modeled after the EPT data format, but COPC clusters the storage of the octree as variably-chunked LAZ data in a single file. This allows the data to be consumed sequentially by any reader than can handle variably-chunked LAZ 1.4 (LASzip, for example), or as a spatial subset for readers that interpret the COPC hierarchy. More information about the differences between EPT data and COPC can be found below.

smathermather commented 4 months ago

def _get_pc(pcfile, sampleradius = 1, minz = -500, maxz = 9000):

    """
    Extract point cloud to XArray while sampling using Poisson dart sampling, and clamping da
ta to min and max range as desired. Will add ability to subset geographically as well.

    Parameters
    ----------

    pcfile : filename or string to web address (string)
    sampleradius : sample radius for Poisson "Dart Throwing) as implemented in PDAL's filters
.sample
    minz : minimum z elevation extracted
    maxz : maximum z elevation extracted

    Returns
    -------
    xugrid DataArray (float, holding point cloud data)

    """

smathermather commented 2 months ago

The one weird thing is the bounding box is wrong for the data when processed, but it otherwise works in QGIS and CloudCompare.

localdevices / ORProfile

Add point cloud extraction routine #10