Open wonder-sk opened 1 year ago
A relevant discussion from some time ago when COPC was being designed: https://github.com/copcio/copcio.github.io/issues/19 TL;DR: we could add also more detailed stats (mean, variance, histogram) but in the end those may not be needed by the clients or the clients may have more specific needs making the additional stats useless (e.g. picking a good bucket size for GpsTime can be tricky).
To kick off some discussion I would propose this a new pc:stats
attribute with this kind of content:
{
"Intensity" : {
"minimum": 0,
"maximum": 12345
},
"GpsTime": {
"minimum": 123456.78,
"maximum": 123999.99
},
"Classification": {
"minimum": 0,
"maximum": 7,
"class-count": {
"0": 1000,
"1": 2000,
"3": 4000,
"7": 8000
}
},
"ReturnNumber": {
"minimum": 1,
"maximum": 3,
"class-count": {
"1": 9000,
"2": 4000,
"3": 2000
}
},
...
}
cc @hobu
Oops only now I have realized that there is already a Stats object defined in the extension :man_facepalming: It just does not include support for classes and their counts.
Other notes on the existing Stats
object:
stddev
and variance
which are essentially the same thing - worth dropping one of thosecount
I assume is the same for all dimensions and the same value as pc:count
- probably not worth including it?position
does not seem relevant to statistics at allaverage
+ stddev
(or variance
) - IMHO they are not that useful and could be dropped, but no problem to keep them eitherGreat post @wonder-sk - I was planning a similar post just this week. My concern is the stats object is just a dump of PDAL information without considering the value to STAC - i.e. What do people want to search for?
Many of the example stats objects provide little value, e.g. ScanDirectionFlag, EdgeOfFlightLine, Classification, UserData, etc. And within those stats objects fields like count and position are questionable.
stdev and variance are just one square root away from each other - but I think they can be left as optional.
I was planning to add a pc:classification field as a [string] of Classifications so you knew what was in a point cloud, but prefer your proposal so you can quantify how much of a particular classification exists.
The number of returns is valuable, but we have lot of metadata that gives more context to the returns themselves that I would like to capture - e.g. "First and Last" would mean we have just two returns and ignored all intermediate returns, or ""4 Returns (1st, 2nd, 3rd, last)"
Maybe you can align with or use the Classification extension? https://github.com/stac-extensions/classification
My concern is the stats object is just a dump of PDAL information
Indeed this was the case, and the intention was to see if we could attract usage and attention to improve the extension. Maybe now is the time. I don't think we have found the stats particularly helpful for searching, but we haven't ditched the schema stuff. That said, I think the schema stuff would probably be better expressed in arrow or regular flatbuf for reusability in other contexts.
You can see an Item collection example we write for the USGS 3DEP lidar collection at https://usgs-lidar-stac.s3-us-west-2.amazonaws.com/ept/item_collection.json
If you visit https://viewer.copc.io you can also bring any of those in and viewing by clicking on the USGS 3DEP LiDAR link and then double clicking on any name that looks interesting or filtering by simple regex.
I like @wonder-sk suggestions on updating the stats object. Since they are all optional, perhaps it is enough to add another optional class-count
?
@m-mohr, I think that extension (correct me if I'm wrong, I just learned about it) describes all possible classes. In this case we just want to summarize the classes present in a single item. So if you were working with LAZ 1.4 data, you'd put the ASPRS Class definitions into the schema, not the statistics.
To simplify this for searching, I usually just want to know if a pointcloud has any building classified points (I don't really care how many points there are.) So maybe modify @wonder-sk suggestion into something like:
"unique-classes": {
"title": "unique list of classifications",
"type": "array",
"minItems": 1,
"items": {
"title": "point classifications present in pointcloud",
"type": "integer"
}
}
And then add unique-classes
as an optional field in the stats object?
It would be useful to have an optional attribute with basic stats defined by this extension:
For software that does visualization of point clouds, this is quite important for initialization of renderer settings. Without stats, one has to sample data to extract them upon load. This would be especially useful when working with a collection of items to quickly get aggregate stats of the whole collection, rather than having to touch every assets of individual items.