Open smit1678 opened 7 years ago
@smit1678 here are a couple questions I have about this feature:
Could you describe the scenario in which you would index local imagery instead of imagery stored on S3? Are the files more likely to be on the same host / network attached storage, or served using an FTP server?
The GeoNode instance and API shows WMS properties, but not underlying image files or ability to download them. This is significantly different from the OAM metadata specification and catalog indexing, and falls inline with serving TMS links. Would the GeoNode instance be able to provide underlying imagery, or should indexing GeoNode be treated like an external TMS (#45)?
Could you describe the scenario in which you would index local imagery instead of imagery stored on S3?
Organizations or field deployments that have a limited or no bandwidth to upload to S3. More practical to keep locally. Run it on a local network.
Are the files more likely to be on the same host / network attached storage, or served using an FTP server?
My guess is that it will be most likely on the same host / network attached storage since this would be easier to have running than setting up an FTP server.
The GeoNode instance and API shows WMS properties, but not underlying image files or ability to download them. This is significantly different from the OAM metadata specification and catalog indexing, and falls inline with serving TMS links.
Agreed, we're not looking to change the specification or restrict the ability to download them.
Would the GeoNode instance be able to provide underlying imagery,
Yes, this would the way we should be thinking about it.
@cgiovando do you have insight on geonode and download endpoints exposed through the API?
@smit1678 for local filesystem indexing, we will need to create HTTP urls to the metadata and image files at indexing time. These URLs are used to download the files in the oam-browser.
We have two options:
We use Minio, which replicates the S3 API and allows us to use the same s3 service code (with some minimal adaptation) to index the files and create HTTP URLs. This requires the files to be added to the minio server using the minio browser interface or through s3 command line tools.
We use a filesystem volume with an HTTP static file server such as Nginx that points to the files. We will need to write new service code for indexing these local files and creating the URLs. This requires the files to be in a folder with a predefined structure (similar to the s3 buckets).
What do you suggest? Would the minio interface be simpler to upload, create, delete imagery, or a file system folder?
Thanks @kamicut. Since Minio is already solved (waiting on a PR from the SPC), you're saying that the best way to approach is to use a filesystem volume with Nginx?
@mojodna Any lessons learned from the POSM/ODM imagery api?
@smit1678 if Minio can solve the case for local filesystem indexing and we're waiting on a PR, that would be a good approach instead of writing new service code because it includes a UI to manage the files as well as a server. We can then include the docker version of minio as part of the docker-compose file in oam-catalog.
Where do you think Minio comes up short here? Do we leave any users out or not factoring for anything? Do we run into workflow issues or foresee any issues with how data needs to be managed locally if we now need to put everything into Minio? If we have an existing folder structure of thousands of images, does Minio work well for that use case?
@smit1678 Minio is structured like S3; to add data to it (for example a folder with thousands of images), you would perform the same operations as an S3 bucket:
If that's an acceptable way to manage the files, then minio gives us what we need. I think the benefit is that writing a script to upload to minio allows it to be ported easily to S3.
@kamicut I'm concerned of the overhead of installing Minio for every local store (e.g shared drives) and the usability barrier of having to rely on S3 tools rather than just copying/moving files.
What would be required for the catalog to be able to index local files directly without having to set up services like Nginx or Minio?
@cgiovando do you have insight on geonode and download endpoints exposed through the API?
Requests for GeoTIFF files can be done via WCS with something like:
http://52.64.9.136/geoserver/wcs?crs=EPSG%3A4326&service=WCS&format=GeoTIFF&request=GetCoverage&height=550&width=735&version=1.0.0&BBox=-159.84358961184%2C-159.71633320416%2C-21.285185100579998%2C-21.189742794820003&Coverage=geonode%3Ageonode_ck_rarotonga
I realized that the above call doesn't return imagery at its native resolution, but if should be possible by tweaking WCS parameters. Here's a thread with some ideas. Maybe @afabiani from the GeoServer team can help with the correct endpoint? ;)
WCS requires a GetCapabilities handshake to know some parameters to pass through the actual data request. @kamicut could this be part of the routine catalog harvesting of sources and stored on the OAM side?
GeoNode/GeoServer also supports WMTS and TMS endpoint, so those should automatically indexed for links and display in the OAM browser.
Metadata would be a bit more challenging, as none is required in GeoNode by default, but ideally if it's available, ISO elements in GeoNode should be mapped to the OIN metadata profile.
Performance and availability of GeoNodes could also be an issues, and make resources less available in OAM. Some caching (of metadata and thumbnails/TMS) could be used or allow the user to chose whether to include potentially slow sources in each search/browse session.
@cgiovando as far as i can tell, we need an http server between the files and the OAM browser; this is because the files need to be mounted as a docker volume to be indexed, and they will have different relative paths inside and outside docker (so the path will be wrong when indexed). We don't need this with S3 because there is a public URL to every file.
As for the code to index the files locally, what is needed is to overload the methods in https://github.com/hotosm/oam-catalog/blob/develop/services/s3.js so that they work with local files.
Sorry for jumping into the discussion, I'm also not sure to have fully understood the context. However, you may want to consider installing this extension to GeoServer
http://docs.geoserver.org/stable/en/user/community/wps-download/index.html
This is a module for the WPS (Web Processing Service) which basically allows you to schedule and store locally (or stream out) raster and vectorial data along with SLDs in a zip format, both synchronously or asynchronously.
The request also allows you to specify different parameters both for the desired projection, resolution and/or area (it is able to cut or intersect the raw data given a geometry).
The extension also provides the possibility to store the outcomes into an FTP or advertise users via email providing a unique identifier and an HTTP endpoint to download the data when ready.
The WPS protocol allows you to follow the progress status asynchronously too.
Following up the comments of @cgiovando, few comments on the W*S Requests.
First of all the version of GeoServer you are using supports also WCS 2.0.1, which is much better to handle such kind of requests.
Using WCS 1.1.1 to get the file at the native resolution is not trivial. First of all you need to perform a DescribeCoverage request in order to get the raster properties
Next you can perform the GetCoverage one to download the file as GeoTIFF format:
POST Request to ->http://52.64.9.136/geoserver/ows?service=wcs&version=1.1.1
<?xml version="1.0" encoding="UTF-8"?>
<GetCoverage version="1.1.1" service="WCS" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.opengis.net/wcs/1.1.1" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:gml="http://www.opengis.net/gml" xmlns:ogc="http://www.opengis.net/ogc" xsi:schemaLocation="http://www.opengis.net/wcs/1.1.1 http://schemas.opengis.net/wcs/1.1.1/wcsAll.xsd">
<ows:Identifier>geonode:geonode_ck_rarotonga</ows:Identifier>
<DomainSubset>
<ows:BoundingBox crs="urn:ogc:def:crs:EPSG::4326">
<ows:LowerCorner>-21.285185100579998 -159.84358961184</ows:LowerCorner>
<ows:UpperCorner>-21.189742794820003 -159.71633320416</ows:UpperCorner>
</ows:BoundingBox>
</DomainSubset>
<Output store="true" format="image/tiff">
<GridCRS>
<GridBaseCRS>urn:ogc:def:crs:EPSG::4326</GridBaseCRS>
<GridType>urn:ogc:def:method:WCS:1.1:2dGridIn2dCrs</GridType>
<GridOrigin>-159.84225712844625 -21.190928863359453</GridOrigin>
<GridOffsets>1.6974310748298416E-4 0.0 0.0 -1.6952671890909312E-4</GridOffsets>
<GridCS>urn:ogc:def:cs:OGC:0.0:Grid2dSquareCS</GridCS>
</GridCRS>
</Output>
</GetCoverage>
By specifying the Output parameter "store=true" you will get a response like this:
<?xml version="1.0" encoding="UTF-8"?>
<wcs:Coverages xmlns:wcs="http://www.opengis.net/wcs/1.1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ogc="http://www.opengis.net/ogc" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:gml="http://www.opengis.net/gml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://52.64.9.136/geoserver/schemas/wcs/1.1.1/wcsCoverages.xsd">
<wcs:Coverage>
<ows:Title>CK_Rarotonga_Satellite_Image_2009</ows:Title>
<ows:Abstract>Generated from GeoTIFF</ows:Abstract>
<ows:Identifier>geonode_ck_rarotonga</ows:Identifier>
<ows:Reference xlink:href="http://52.64.9.136/geoserver/temp/wcs/geonode_ck_rarotonga_18978746138562724.tif"/>
</wcs:Coverage>
</wcs:Coverages>
The Reference link allows you to get the tiff from the HTTP protocol.
Using the WCS 2.0.1 (DescribeCoverage here) is much easier. If you don't need to do subsetting a straight HTTP GET request is sufficient to stream out the coverage at its native resolution:
It is possible to get it also on a different format (e.g. PNG)
_Notice the coverageId=geonode__geonode_ck_rarotonga_
but it's not advisable it you need to generate a thumbnail. since you will get the full resolution image. In this case you can either specify a lower size/resolution on WCS
Notice the SCALEFACTOR=0.3
or use the WMS GetMap Request
WMTS and TMS Services are provided through embedded GeoWebCache (GWC) of GeoServer
http://52.64.9.136/geoserver/gwc
The WMTS request (a single tile) can be optained like this:
The tilematrixset and tilematrix values can be obtained from the WMTS GetCapabilities request here.
The same tile through the TMS protocol can be obtained like below:
More info here http://leafletjs.com/examples/wms/wms.html
Thanks @afabiani - very useful information
Using the WCS 2.0.1 (DescribeCoverage here) is much easier. If you don't need to do subsetting a >straight HTTP GET request is sufficient to stream out the coverage at its native resolution:
This request does not request the full resolution image. Is any parameter missing?
But if that works, then, as discussed today, one option for OAM indexing is to have a small worker for hourly/daily GetCoverage
requests to feed endpoints (GeoTIFF, WMTS/TMS, and WMS for thumbnail) to the OAM catalog.
Hi @cgiovando as far as I can see the downloaded tiff is at native resolution, or at least is compatible with resolution declared on DescribeCoverage (see below the comparison between GDAL-info and DescribeCoverage)
Files: C:\Users\Dell\Downloads\geonode__geonode_ck_rarotonga (1).tif
Size is 735, 550
Coordinate System is:
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4326"]]
Origin = (-159.842342000000000,-21.190844100000000)
Pixel Size = (0.000169743107483,-0.000169526718909)
Metadata:
AREA_OR_POINT=Area
TIFFTAG_RESOLUTIONUNIT=1 (unitless)
TIFFTAG_XRESOLUTION=1
TIFFTAG_YRESOLUTION=1
Image Structure Metadata:
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left (-159.8423420, -21.1908441) (159d50'32.43"W, 21d11'27.04"S)
Lower Left (-159.8423420, -21.2840838) (159d50'32.43"W, 21d17' 2.70"S)
Upper Right (-159.7175808, -21.1908441) (159d43' 3.29"W, 21d11'27.04"S)
Lower Right (-159.7175808, -21.2840838) (159d43' 3.29"W, 21d17' 2.70"S)
Center (-159.7799614, -21.2374639) (159d46'47.86"W, 21d14'14.87"S)
Band 1 Block=735x8 Type=Byte, ColorInterp=Red
NoData Value=0
Band 2 Block=735x8 Type=Byte, ColorInterp=Green
NoData Value=0
Band 3 Block=735x8 Type=Byte, ColorInterp=Blue
NoData Value=0
<?xml version="1.0" encoding="UTF-8"?><wcs:CoverageDescriptions xmlns:wcs="http://www.opengis.net/wcs/2.0" xmlns:ows="http://www.opengis.net/ows/2.0" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:gmlcov="http://www.opengis.net/gmlcov/1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:swe="http://www.opengis.net/swe/2.0" xmlns:wcsgs="http://www.geoserver.org/wcsgs/2.0" xsi:schemaLocation=" http://www.opengis.net/wcs/2.0 http://schemas.opengis.net/wcs/2.0/wcsDescribeCoverage.xsd">
<wcs:CoverageDescription gml:id="geonode__geonode_ck_rarotonga">
<gml:boundedBy>
<gml:Envelope srsName="http://www.opengis.net/def/crs/EPSG/0/4326" axisLabels="Lat Long" uomLabels="Deg Deg" srsDimension="2">
<gml:lowerCorner>-21.2840837954 -159.842342</gml:lowerCorner>
<gml:upperCorner>-21.1908441 -159.717580816</gml:upperCorner>
</gml:Envelope>
</gml:boundedBy>
<wcs:CoverageId>geonode__geonode_ck_rarotonga</wcs:CoverageId>
<gml:coverageFunction>
<gml:GridFunction>
<gml:sequenceRule axisOrder="+2 +1">Linear</gml:sequenceRule>
<gml:startPoint>0 0</gml:startPoint>
</gml:GridFunction>
</gml:coverageFunction>
<gmlcov:metadata>
<gmlcov:Extension/>
</gmlcov:metadata>
<gml:domainSet>
<gml:RectifiedGrid gml:id="grid00__geonode__geonode_ck_rarotonga" dimension="2">
<gml:limits>
<gml:GridEnvelope>
<gml:low>0 0</gml:low>
<gml:high>734 549</gml:high>
</gml:GridEnvelope>
</gml:limits>
<gml:axisLabels>i j</gml:axisLabels>
<gml:origin>
<gml:Point gml:id="p00_geonode__geonode_ck_rarotonga" srsName="http://www.opengis.net/def/crs/EPSG/0/4326">
<gml:pos>-21.190928863359453 -159.84225712844625</gml:pos>
</gml:Point>
</gml:origin>
<gml:offsetVector srsName="http://www.opengis.net/def/crs/EPSG/0/4326">0.0 1.6974310748298416E-4</gml:offsetVector>
<gml:offsetVector srsName="http://www.opengis.net/def/crs/EPSG/0/4326">-1.6952671890909312E-4 0.0</gml:offsetVector>
</gml:RectifiedGrid>
</gml:domainSet>
<gmlcov:rangeType>
<swe:DataRecord>
<swe:field name="RED_BAND">
<swe:Quantity>
<swe:description>RED_BAND</swe:description>
<swe:nilValues>
<swe:NilValues>
<swe:nilValue reason="http://www.opengis.net/def/nil/OGC/0/unknown">0.0</swe:nilValue>
</swe:NilValues>
</swe:nilValues>
<swe:uom code="W.m-2.Sr-1"/>
<swe:constraint>
<swe:AllowedValues>
<swe:interval>0.0 0.0</swe:interval>
</swe:AllowedValues>
</swe:constraint>
</swe:Quantity>
</swe:field>
<swe:field name="GREEN_BAND">
<swe:Quantity>
<swe:description>GREEN_BAND</swe:description>
<swe:nilValues>
<swe:NilValues>
<swe:nilValue reason="http://www.opengis.net/def/nil/OGC/0/unknown">0.0</swe:nilValue>
</swe:NilValues>
</swe:nilValues>
<swe:uom code="W.m-2.Sr-1"/>
<swe:constraint>
<swe:AllowedValues>
<swe:interval>0.0 0.0</swe:interval>
</swe:AllowedValues>
</swe:constraint>
</swe:Quantity>
</swe:field>
<swe:field name="BLUE_BAND">
<swe:Quantity>
<swe:description>BLUE_BAND</swe:description>
<swe:nilValues>
<swe:NilValues>
<swe:nilValue reason="http://www.opengis.net/def/nil/OGC/0/unknown">0.0</swe:nilValue>
</swe:NilValues>
</swe:nilValues>
<swe:uom code="W.m-2.Sr-1"/>
<swe:constraint>
<swe:AllowedValues>
<swe:interval>0.0 0.0</swe:interval>
</swe:AllowedValues>
</swe:constraint>
</swe:Quantity>
</swe:field>
</swe:DataRecord>
</gmlcov:rangeType>
<wcs:ServiceParameters>
<wcs:CoverageSubtype>RectifiedGridCoverage</wcs:CoverageSubtype>
<wcs:nativeFormat>image/tiff</wcs:nativeFormat>
</wcs:ServiceParameters>
</wcs:CoverageDescription>
</wcs:CoverageDescriptions>
About the TMS, I currently don't know exactly why is returning an empty image; I'm afraid it depends on the embedded GWC and the old version of GeoServer. We might need to investigate more by rising up the log level and inspecting the debug messages.
@mojodna Any lessons learned from the POSM/ODM imagery api?
Not specifically relevant to this.
@afabiani Thanks for the comments earlier and insight into Geonode endpoints.
To follow up and update here, we segmented out the local file indexing into a PR, https://github.com/hotosm/oam-catalog/pull/93. This was completed. I edited the title of this ticket to highlight more of the focus of this the convo above, which is on indexing Geonode. I'll leave this ticket open since I don't think it is resolved on how Geonode integration would happen since of this seems dependent on how the Geonode is setup and run.
From OAM core work, look to develop oam-catalog ability to index from a local filestore and a Geonode -- which serves as an example for an json-based file storage location.
Sample imagery: http://52.64.9.136/search/?keywords__slug__in=imagery&limit=100&offset=0. API endpoint: http://52.64.9.136/api/base/?keywords__slug__in=imagery&limit=100&offset=0.