hotosm / OpenAerialMap

OpenAerialMap is an open service to provide access to a commons of openly licensed imagery and map layer services.
https://openaerialmap.org/
243 stars 31 forks source link

Allow imagery to be indexed from geonode #53

Open smit1678 opened 7 years ago

smit1678 commented 7 years ago

From OAM core work, look to develop oam-catalog ability to index from a local filestore and a Geonode -- which serves as an example for an json-based file storage location.

Sample imagery: http://52.64.9.136/search/?keywords__slug__in=imagery&limit=100&offset=0. API endpoint: http://52.64.9.136/api/base/?keywords__slug__in=imagery&limit=100&offset=0.

kamicut commented 7 years ago

@smit1678 here are a couple questions I have about this feature:

smit1678 commented 7 years ago

Could you describe the scenario in which you would index local imagery instead of imagery stored on S3?

Organizations or field deployments that have a limited or no bandwidth to upload to S3. More practical to keep locally. Run it on a local network.

Are the files more likely to be on the same host / network attached storage, or served using an FTP server?

My guess is that it will be most likely on the same host / network attached storage since this would be easier to have running than setting up an FTP server.

The GeoNode instance and API shows WMS properties, but not underlying image files or ability to download them. This is significantly different from the OAM metadata specification and catalog indexing, and falls inline with serving TMS links.

Agreed, we're not looking to change the specification or restrict the ability to download them.

Would the GeoNode instance be able to provide underlying imagery,

Yes, this would the way we should be thinking about it.

@cgiovando do you have insight on geonode and download endpoints exposed through the API?

kamicut commented 7 years ago

@smit1678 for local filesystem indexing, we will need to create HTTP urls to the metadata and image files at indexing time. These URLs are used to download the files in the oam-browser.

We have two options:

  1. We use Minio, which replicates the S3 API and allows us to use the same s3 service code (with some minimal adaptation) to index the files and create HTTP URLs. This requires the files to be added to the minio server using the minio browser interface or through s3 command line tools.

  2. We use a filesystem volume with an HTTP static file server such as Nginx that points to the files. We will need to write new service code for indexing these local files and creating the URLs. This requires the files to be in a folder with a predefined structure (similar to the s3 buckets).

What do you suggest? Would the minio interface be simpler to upload, create, delete imagery, or a file system folder?

smit1678 commented 7 years ago

Thanks @kamicut. Since Minio is already solved (waiting on a PR from the SPC), you're saying that the best way to approach is to use a filesystem volume with Nginx?

@mojodna Any lessons learned from the POSM/ODM imagery api?

kamicut commented 7 years ago

@smit1678 if Minio can solve the case for local filesystem indexing and we're waiting on a PR, that would be a good approach instead of writing new service code because it includes a UI to manage the files as well as a server. We can then include the docker version of minio as part of the docker-compose file in oam-catalog.

smit1678 commented 7 years ago

Where do you think Minio comes up short here? Do we leave any users out or not factoring for anything? Do we run into workflow issues or foresee any issues with how data needs to be managed locally if we now need to put everything into Minio? If we have an existing folder structure of thousands of images, does Minio work well for that use case?

kamicut commented 7 years ago

@smit1678 Minio is structured like S3; to add data to it (for example a folder with thousands of images), you would perform the same operations as an S3 bucket:

  1. Create the metadata files for each image
  2. Use s3cmd or aws-cli in a script to upload the data to a bucket in minio

If that's an acceptable way to manage the files, then minio gives us what we need. I think the benefit is that writing a script to upload to minio allows it to be ported easily to S3.

cgiovando commented 7 years ago

@kamicut I'm concerned of the overhead of installing Minio for every local store (e.g shared drives) and the usability barrier of having to rely on S3 tools rather than just copying/moving files.

What would be required for the catalog to be able to index local files directly without having to set up services like Nginx or Minio?

cgiovando commented 7 years ago

@cgiovando do you have insight on geonode and download endpoints exposed through the API?

Requests for GeoTIFF files can be done via WCS with something like:

http://52.64.9.136/geoserver/wcs?crs=EPSG%3A4326&service=WCS&format=GeoTIFF&request=GetCoverage&height=550&width=735&version=1.0.0&BBox=-159.84358961184%2C-159.71633320416%2C-21.285185100579998%2C-21.189742794820003&Coverage=geonode%3Ageonode_ck_rarotonga

I realized that the above call doesn't return imagery at its native resolution, but if should be possible by tweaking WCS parameters. Here's a thread with some ideas. Maybe @afabiani from the GeoServer team can help with the correct endpoint? ;)

WCS requires a GetCapabilities handshake to know some parameters to pass through the actual data request. @kamicut could this be part of the routine catalog harvesting of sources and stored on the OAM side?

GeoNode/GeoServer also supports WMTS and TMS endpoint, so those should automatically indexed for links and display in the OAM browser.

Metadata would be a bit more challenging, as none is required in GeoNode by default, but ideally if it's available, ISO elements in GeoNode should be mapped to the OIN metadata profile.

Performance and availability of GeoNodes could also be an issues, and make resources less available in OAM. Some caching (of metadata and thumbnails/TMS) could be used or allow the user to chose whether to include potentially slow sources in each search/browse session.

kamicut commented 7 years ago

@cgiovando as far as i can tell, we need an http server between the files and the OAM browser; this is because the files need to be mounted as a docker volume to be indexed, and they will have different relative paths inside and outside docker (so the path will be wrong when indexed). We don't need this with S3 because there is a public URL to every file.

As for the code to index the files locally, what is needed is to overload the methods in https://github.com/hotosm/oam-catalog/blob/develop/services/s3.js so that they work with local files.

afabiani commented 7 years ago

Sorry for jumping into the discussion, I'm also not sure to have fully understood the context. However, you may want to consider installing this extension to GeoServer

http://docs.geoserver.org/stable/en/user/community/wps-download/index.html

This is a module for the WPS (Web Processing Service) which basically allows you to schedule and store locally (or stream out) raster and vectorial data along with SLDs in a zip format, both synchronously or asynchronously.

The request also allows you to specify different parameters both for the desired projection, resolution and/or area (it is able to cut or intersect the raw data given a geometry).

The extension also provides the possibility to store the outcomes into an FTP or advertise users via email providing a unique identifier and an HTTP endpoint to download the data when ready.

The WPS protocol allows you to follow the progress status asynchronously too.

afabiani commented 7 years ago

Following up the comments of @cgiovando, few comments on the W*S Requests.

First of all the version of GeoServer you are using supports also WCS 2.0.1, which is much better to handle such kind of requests.

Using WCS 1.1.1 to get the file at the native resolution is not trivial. First of all you need to perform a DescribeCoverage request in order to get the raster properties

http://52.64.9.136/geoserver/ows?service=wcs&version=1.1.1&request=DescribeCoverage&identifiers=geonode%3Ageonode_ck_rarotonga

Next you can perform the GetCoverage one to download the file as GeoTIFF format:

POST Request to ->http://52.64.9.136/geoserver/ows?service=wcs&version=1.1.1

<?xml version="1.0" encoding="UTF-8"?>
<GetCoverage version="1.1.1" service="WCS" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.opengis.net/wcs/1.1.1" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:gml="http://www.opengis.net/gml" xmlns:ogc="http://www.opengis.net/ogc" xsi:schemaLocation="http://www.opengis.net/wcs/1.1.1 http://schemas.opengis.net/wcs/1.1.1/wcsAll.xsd">
  <ows:Identifier>geonode:geonode_ck_rarotonga</ows:Identifier>
  <DomainSubset>
    <ows:BoundingBox crs="urn:ogc:def:crs:EPSG::4326">
      <ows:LowerCorner>-21.285185100579998 -159.84358961184</ows:LowerCorner>
      <ows:UpperCorner>-21.189742794820003 -159.71633320416</ows:UpperCorner>
    </ows:BoundingBox>
  </DomainSubset>
  <Output store="true" format="image/tiff">
    <GridCRS>
      <GridBaseCRS>urn:ogc:def:crs:EPSG::4326</GridBaseCRS>
      <GridType>urn:ogc:def:method:WCS:1.1:2dGridIn2dCrs</GridType>
      <GridOrigin>-159.84225712844625 -21.190928863359453</GridOrigin>
      <GridOffsets>1.6974310748298416E-4 0.0 0.0 -1.6952671890909312E-4</GridOffsets>
      <GridCS>urn:ogc:def:cs:OGC:0.0:Grid2dSquareCS</GridCS>
    </GridCRS>
  </Output>
</GetCoverage>

By specifying the Output parameter "store=true" you will get a response like this:

<?xml version="1.0" encoding="UTF-8"?>
<wcs:Coverages xmlns:wcs="http://www.opengis.net/wcs/1.1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ogc="http://www.opengis.net/ogc" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:gml="http://www.opengis.net/gml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://52.64.9.136/geoserver/schemas/wcs/1.1.1/wcsCoverages.xsd">
   <wcs:Coverage>
      <ows:Title>CK_Rarotonga_Satellite_Image_2009</ows:Title>
      <ows:Abstract>Generated from GeoTIFF</ows:Abstract>
      <ows:Identifier>geonode_ck_rarotonga</ows:Identifier>
      <ows:Reference xlink:href="http://52.64.9.136/geoserver/temp/wcs/geonode_ck_rarotonga_18978746138562724.tif"/>
   </wcs:Coverage>
</wcs:Coverages>

The Reference link allows you to get the tiff from the HTTP protocol.

Using the WCS 2.0.1 (DescribeCoverage here) is much easier. If you don't need to do subsetting a straight HTTP GET request is sufficient to stream out the coverage at its native resolution:

http://52.64.9.136/geoserver/ows?service=WCS&version=2.0.1&request=GetCoverage&coverageId=geonode__geonode_ck_rarotonga&format=image/tiff

It is possible to get it also on a different format (e.g. PNG)

http://52.64.9.136/geoserver/ows?service=WCS&version=2.0.1&request=GetCoverage&coverageId=geonode__geonode_ck_rarotonga&format=image/png

_Notice the coverageId=geonode__geonode_ck_rarotonga_

but it's not advisable it you need to generate a thumbnail. since you will get the full resolution image. In this case you can either specify a lower size/resolution on WCS

http://52.64.9.136/geoserver/ows?service=WCS&version=2.0.1&request=GetCoverage&coverageId=geonode__geonode_ck_rarotonga&format=image/png&SCALEFACTOR=0.3

Notice the SCALEFACTOR=0.3

or use the WMS GetMap Request

http://52.64.9.136/geoserver/ows?service=WMS&version=1.3.0&request=GetMap&layers=geonode:geonode_ck_rarotonga&format=image/png&width=100&height=100&bbox=-159.84358961184,-21.285185100579998,-159.71633320416,-21.189742794820003

WMTS and TMS Services are provided through embedded GeoWebCache (GWC) of GeoServer

http://52.64.9.136/geoserver/gwc

The WMTS request (a single tile) can be optained like this:

http://52.64.9.136/geoserver/gwc/service/wmts?service=WMTS&version=1.0.0&request=GetTile&layer=geonode:geonode_ck_rarotonga&format=image/jpeg&tilematrixset=EPSG:4326&tilematrix=EPSG:4326:12&TILEROW=2531&TILECOL=459

The tilematrixset and tilematrix values can be obtained from the WMTS GetCapabilities request here.

The same tile through the TMS protocol can be obtained like below:

http://52.64.9.136/geoserver/gwc/service/tms/1.0.0/geonode%3Ageonode_ck_rarotonga@EPSG%3A4326@jpeg/12/2531/433.jpg

More info here http://leafletjs.com/examples/wms/wms.html

cgiovando commented 7 years ago

Thanks @afabiani - very useful information

Using the WCS 2.0.1 (DescribeCoverage here) is much easier. If you don't need to do subsetting a >straight HTTP GET request is sufficient to stream out the coverage at its native resolution:

http://52.64.9.136/geoserver/ows?service=WCS&version=2.0.1&request=GetCoverage&coverageId=geonode__geonode_ck_rarotonga&format=image/tiff

This request does not request the full resolution image. Is any parameter missing?

But if that works, then, as discussed today, one option for OAM indexing is to have a small worker for hourly/daily GetCoverage requests to feed endpoints (GeoTIFF, WMTS/TMS, and WMS for thumbnail) to the OAM catalog.

afabiani commented 7 years ago

Hi @cgiovando as far as I can see the downloaded tiff is at native resolution, or at least is compatible with resolution declared on DescribeCoverage (see below the comparison between GDAL-info and DescribeCoverage)

image

Files: C:\Users\Dell\Downloads\geonode__geonode_ck_rarotonga (1).tif
Size is 735, 550
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-159.842342000000000,-21.190844100000000)
Pixel Size = (0.000169743107483,-0.000169526718909)
Metadata:
  AREA_OR_POINT=Area
  TIFFTAG_RESOLUTIONUNIT=1 (unitless)
  TIFFTAG_XRESOLUTION=1
  TIFFTAG_YRESOLUTION=1
Image Structure Metadata:
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (-159.8423420, -21.1908441) (159d50'32.43"W, 21d11'27.04"S)
Lower Left  (-159.8423420, -21.2840838) (159d50'32.43"W, 21d17' 2.70"S)
Upper Right (-159.7175808, -21.1908441) (159d43' 3.29"W, 21d11'27.04"S)
Lower Right (-159.7175808, -21.2840838) (159d43' 3.29"W, 21d17' 2.70"S)
Center      (-159.7799614, -21.2374639) (159d46'47.86"W, 21d14'14.87"S)
Band 1 Block=735x8 Type=Byte, ColorInterp=Red
  NoData Value=0
Band 2 Block=735x8 Type=Byte, ColorInterp=Green
  NoData Value=0
Band 3 Block=735x8 Type=Byte, ColorInterp=Blue
  NoData Value=0

image

<?xml version="1.0" encoding="UTF-8"?><wcs:CoverageDescriptions xmlns:wcs="http://www.opengis.net/wcs/2.0" xmlns:ows="http://www.opengis.net/ows/2.0" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:gmlcov="http://www.opengis.net/gmlcov/1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:swe="http://www.opengis.net/swe/2.0" xmlns:wcsgs="http://www.geoserver.org/wcsgs/2.0" xsi:schemaLocation=" http://www.opengis.net/wcs/2.0 http://schemas.opengis.net/wcs/2.0/wcsDescribeCoverage.xsd">
  <wcs:CoverageDescription gml:id="geonode__geonode_ck_rarotonga">
    <gml:boundedBy>
      <gml:Envelope srsName="http://www.opengis.net/def/crs/EPSG/0/4326" axisLabels="Lat Long" uomLabels="Deg Deg" srsDimension="2">
        <gml:lowerCorner>-21.2840837954 -159.842342</gml:lowerCorner>
        <gml:upperCorner>-21.1908441 -159.717580816</gml:upperCorner>
      </gml:Envelope>
    </gml:boundedBy>
    <wcs:CoverageId>geonode__geonode_ck_rarotonga</wcs:CoverageId>
    <gml:coverageFunction>
      <gml:GridFunction>
        <gml:sequenceRule axisOrder="+2 +1">Linear</gml:sequenceRule>
        <gml:startPoint>0 0</gml:startPoint>
      </gml:GridFunction>
    </gml:coverageFunction>
    <gmlcov:metadata>
      <gmlcov:Extension/>
    </gmlcov:metadata>
    <gml:domainSet>
      <gml:RectifiedGrid gml:id="grid00__geonode__geonode_ck_rarotonga" dimension="2">
        <gml:limits>
          <gml:GridEnvelope>
            <gml:low>0 0</gml:low>
            <gml:high>734 549</gml:high>
          </gml:GridEnvelope>
        </gml:limits>
        <gml:axisLabels>i j</gml:axisLabels>
        <gml:origin>
          <gml:Point gml:id="p00_geonode__geonode_ck_rarotonga" srsName="http://www.opengis.net/def/crs/EPSG/0/4326">
            <gml:pos>-21.190928863359453 -159.84225712844625</gml:pos>
          </gml:Point>
        </gml:origin>
        <gml:offsetVector srsName="http://www.opengis.net/def/crs/EPSG/0/4326">0.0 1.6974310748298416E-4</gml:offsetVector>
        <gml:offsetVector srsName="http://www.opengis.net/def/crs/EPSG/0/4326">-1.6952671890909312E-4 0.0</gml:offsetVector>
      </gml:RectifiedGrid>
    </gml:domainSet>
    <gmlcov:rangeType>
      <swe:DataRecord>
        <swe:field name="RED_BAND">
          <swe:Quantity>
            <swe:description>RED_BAND</swe:description>
            <swe:nilValues>
              <swe:NilValues>
                <swe:nilValue reason="http://www.opengis.net/def/nil/OGC/0/unknown">0.0</swe:nilValue>
              </swe:NilValues>
            </swe:nilValues>
            <swe:uom code="W.m-2.Sr-1"/>
            <swe:constraint>
              <swe:AllowedValues>
                <swe:interval>0.0 0.0</swe:interval>
              </swe:AllowedValues>
            </swe:constraint>
          </swe:Quantity>
        </swe:field>
        <swe:field name="GREEN_BAND">
          <swe:Quantity>
            <swe:description>GREEN_BAND</swe:description>
            <swe:nilValues>
              <swe:NilValues>
                <swe:nilValue reason="http://www.opengis.net/def/nil/OGC/0/unknown">0.0</swe:nilValue>
              </swe:NilValues>
            </swe:nilValues>
            <swe:uom code="W.m-2.Sr-1"/>
            <swe:constraint>
              <swe:AllowedValues>
                <swe:interval>0.0 0.0</swe:interval>
              </swe:AllowedValues>
            </swe:constraint>
          </swe:Quantity>
        </swe:field>
        <swe:field name="BLUE_BAND">
          <swe:Quantity>
            <swe:description>BLUE_BAND</swe:description>
            <swe:nilValues>
              <swe:NilValues>
                <swe:nilValue reason="http://www.opengis.net/def/nil/OGC/0/unknown">0.0</swe:nilValue>
              </swe:NilValues>
            </swe:nilValues>
            <swe:uom code="W.m-2.Sr-1"/>
            <swe:constraint>
              <swe:AllowedValues>
                <swe:interval>0.0 0.0</swe:interval>
              </swe:AllowedValues>
            </swe:constraint>
          </swe:Quantity>
        </swe:field>
      </swe:DataRecord>
    </gmlcov:rangeType>
    <wcs:ServiceParameters>
      <wcs:CoverageSubtype>RectifiedGridCoverage</wcs:CoverageSubtype>
      <wcs:nativeFormat>image/tiff</wcs:nativeFormat>
    </wcs:ServiceParameters>
  </wcs:CoverageDescription>
</wcs:CoverageDescriptions>

About the TMS, I currently don't know exactly why is returning an empty image; I'm afraid it depends on the embedded GWC and the old version of GeoServer. We might need to investigate more by rising up the log level and inspecting the debug messages.

mojodna commented 7 years ago

@mojodna Any lessons learned from the POSM/ODM imagery api?

Not specifically relevant to this.

smit1678 commented 7 years ago

@afabiani Thanks for the comments earlier and insight into Geonode endpoints.

To follow up and update here, we segmented out the local file indexing into a PR, https://github.com/hotosm/oam-catalog/pull/93. This was completed. I edited the title of this ticket to highlight more of the focus of this the convo above, which is on indexing Geonode. I'll leave this ticket open since I don't think it is resolved on how Geonode integration would happen since of this seems dependent on how the Geonode is setup and run.