stac-extensions / pointcloud

Provides a way to describe point cloud datasets. The point clouds can come from either active or passive sensors, and data is frequently acquired using tools such as LiDAR or coincidence-matched imagery.
Apache License 2.0
11 stars 3 forks source link

Basic statistics for dimensions #5

Open wonder-sk opened 1 year ago

wonder-sk commented 1 year ago

It would be useful to have an optional attribute with basic stats defined by this extension:

For software that does visualization of point clouds, this is quite important for initialization of renderer settings. Without stats, one has to sample data to extract them upon load. This would be especially useful when working with a collection of items to quickly get aggregate stats of the whole collection, rather than having to touch every assets of individual items.

wonder-sk commented 1 year ago

A relevant discussion from some time ago when COPC was being designed: https://github.com/copcio/copcio.github.io/issues/19 TL;DR: we could add also more detailed stats (mean, variance, histogram) but in the end those may not be needed by the clients or the clients may have more specific needs making the additional stats useless (e.g. picking a good bucket size for GpsTime can be tricky).

To kick off some discussion I would propose this a new pc:stats attribute with this kind of content:

{
  "Intensity" : {
    "minimum": 0,
    "maximum": 12345
  },
  "GpsTime": {
    "minimum": 123456.78,
    "maximum": 123999.99
  },
  "Classification": {
    "minimum": 0,
    "maximum": 7,
    "class-count": {
      "0": 1000,
      "1": 2000,
      "3": 4000,
      "7": 8000
    }
  },
  "ReturnNumber": {
    "minimum": 1,
    "maximum": 3,
    "class-count": {
      "1": 9000,
      "2": 4000,
      "3": 2000
    }
  },
  ...
}
wonder-sk commented 1 year ago

cc @hobu

wonder-sk commented 1 year ago

Oops only now I have realized that there is already a Stats object defined in the extension :man_facepalming: It just does not include support for classes and their counts.

Other notes on the existing Stats object:

raelwaed commented 1 year ago

Great post @wonder-sk - I was planning a similar post just this week. My concern is the stats object is just a dump of PDAL information without considering the value to STAC - i.e. What do people want to search for?

Many of the example stats objects provide little value, e.g. ScanDirectionFlag, EdgeOfFlightLine, Classification, UserData, etc. And within those stats objects fields like count and position are questionable.

stdev and variance are just one square root away from each other - but I think they can be left as optional.

I was planning to add a pc:classification field as a [string] of Classifications so you knew what was in a point cloud, but prefer your proposal so you can quantify how much of a particular classification exists.

The number of returns is valuable, but we have lot of metadata that gives more context to the returns themselves that I would like to capture - e.g. "First and Last" would mean we have just two returns and ignored all intermediate returns, or ""4 Returns (1st, 2nd, 3rd, last)"

m-mohr commented 1 year ago

Maybe you can align with or use the Classification extension? https://github.com/stac-extensions/classification

hobu commented 1 year ago

My concern is the stats object is just a dump of PDAL information

Indeed this was the case, and the intention was to see if we could attract usage and attention to improve the extension. Maybe now is the time. I don't think we have found the stats particularly helpful for searching, but we haven't ditched the schema stuff. That said, I think the schema stuff would probably be better expressed in arrow or regular flatbuf for reusability in other contexts.

You can see an Item collection example we write for the USGS 3DEP lidar collection at https://usgs-lidar-stac.s3-us-west-2.amazonaws.com/ept/item_collection.json

If you visit https://viewer.copc.io you can also bring any of those in and viewing by clicking on the USGS 3DEP LiDAR link and then double clicking on any name that looks interesting or filtering by simple regex.

viewer copc io-stac
mccarthyryanc commented 10 months ago

I like @wonder-sk suggestions on updating the stats object. Since they are all optional, perhaps it is enough to add another optional class-count?

@m-mohr, I think that extension (correct me if I'm wrong, I just learned about it) describes all possible classes. In this case we just want to summarize the classes present in a single item. So if you were working with LAZ 1.4 data, you'd put the ASPRS Class definitions into the schema, not the statistics.

To simplify this for searching, I usually just want to know if a pointcloud has any building classified points (I don't really care how many points there are.) So maybe modify @wonder-sk suggestion into something like:

    "unique-classes": {
        "title": "unique list of classifications",
            "type": "array",
            "minItems": 1,
            "items": {
                "title": "point classifications present in pointcloud",
                "type": "integer"
            }
    }

And then add unique-classes as an optional field in the stats object?