Open adamsteer opened 7 years ago
Hello Adam. I see you relate the current idea to https://github.com/opengeospatial/testbed14-ideas/issues/26 and https://github.com/opengeospatial/testbed14-ideas/issues/25. Can you elaborate on how different infrastructure types such as Spark or HPC relates to a more general point cloud data service? What I can say it that is defintely relate to https://github.com/opengeospatial/testbed14-ideas/issues/24 and as such I could help derive specifications.
Hi Tom - sure. The short answer is that distributed processing engines and calling on HPC queues to derive user-defined products from our point cloud holdings are on our to-do list.
Longer:
Process distribution engines (eg spark) and HPC access are critical to delivering data heavy services. Since the WPS API is pretty generalised, it is plausible to run fairly heavy compute jobs if connections to HPC queues or distributed processing engines can be made (with appropriate security and accounting in place)
In the context of this ticket - we may want to use data from multiple sources to get a job done - for example, merging topobathymetric LiDAR and multibeam bathymetry and sUAS/photogrammetric data. We are going to need deploy those jobs across multiple processing nodes or in an HPC queue, and ideally we consume as few cycles as possible wrangling data.
Happy to hear your thoughts on specifications re #24 ...
I'm updating https://github.com/opengeospatial/testbed14-ideas/issues/24 to add more details on our use case. On security-related concerns that you mention, please look at https://github.com/opengeospatial/testbed14-ideas/issues/3 proposed by OGC on Federated systems. You'll see I proposed to consider ESGF as a use case. If the idea makes its way to sponsors and is validated, and as your institution, NCI Australia is part of ESGF, that would fit well in the picture you depict.
As you know, members of ESGF have distributed and heteregenous infrastructure for processing and task distribution. Several sites have implemented queue-based task management (Celery, RabbitMQ) as well as a common API to call base services on different technologies. I think it is worthwhile to consider https://github.com/opengeospatial/testbed14-ideas/issues/29 Pub/Sub approach and see how well it fits.
For execution, there is also a proposal on https://github.com/opengeospatial/testbed14-ideas/issues/23 to use Docker Containers in WPS workflows. Initial findings (not approved by OGC yet) in TB13 Earth Observation Clouds (EOC) thread also consider Docker as a valid mecanism to package applications. I am unsure how well LIDAR data and processes fits in ESGF core mission of management and dissemination of climate model outputs and observational data (mostly satellite-based). Nevertheless, I think the overall challenges are compatible.
@adamsteer : this reference came up in TB-14 Machine Learning task. Ideas? https://www.gim-international.com/content/article/automatic-object-detection-in-point-clouds
@tomLandry plenty, and it’s something I’ve got on my future-me slate (inside the next 12-24 months).
This idea touches on the testbed-14 mindmap (http://www.opengeospatial.org/projects/initiatives/testbed14) branch about point cloud data processing. It also relates to #8, #9, #26, #25; and touches on activities in the OGC point cloud domain working group around how to best deliver an OGC point cloud data ecosystem.
A WPS-based point cloud data service is in prototype at NCI (pointclouds.nci.org.au). The use case here is to deliver at a broader set of functionality than 'topography' or 'raw points' - ie keystone science products from our warehoused point data, whether it comes from LiDAR, photogrammetry, multibeam sonar or other sources (iphones etc).
Our experience so far has helped to shape how we expect point data to arrive - what expectations we have for point data in order to deliver a useable service. This also helps to shape our ideas about what a pointcloud data service statement from OGC should look like - whether we should worry about how bytes are written to disk or more broad statements like 'an OGC-compliant point data set shall have attributes X,Y,Z,M, and Q'.
Ongoing development will be necessarily pushed into containerising and scaling WPS processes, dealing with issues about multiple point data sources, and making a definitive statement about what attributes point datasets need to work well with each other.