Closed chrisbennight closed 8 years ago
See #380. The Avro Schema-Simple Feature Type construction is required in both tickets.
The only difference is in the serialization details: (1) feature collection concept. (2) Store the feature definitions, Avro schema, index into a separate meta-data file.
Keep in mind this ticket is an GeoWave 'export' -> files in HDFS -> 'import' to Geowave.
Importing to GeoWave should be thought in general terms. This means the the DataStore is not assumed to be Accumulo. The DataStore of choice could be configured using SPI or some other property. One way to deal with this is to make adaptations to the GeoWaveOutputFormat (which ingests into GeoWave). I think this adaptation can be a separate ticket.
The key thing to keep in mind is the file contents should be GeoWave version agnostic. Class names and binary encoded images are not supported. Hence, why Avro represents a good choice. The schema describing the the data at rest in the file system may differ from the schema describing the data to be imported (attribute add/removed/rename etc.). To support this idea, here is a story.
A user wants to add a new feature attribute to feature data stored in GeoWave. In addition, the user wants to use the latest GeoWave version. However, the feature data adapter has changed , adding a new attribute to its serialized image. The user uses the export tool to export there existing data along with the meta-data definitions using the older version of GeoWave. The user defines the schema for the new feature type. The user provides a transformation function to take features of the prior version to fill in the new attribute of the new version of data. The GeoWave team has conveniently provided transformation function that transforms the feature data adapter meta-data to the new version of the adapter, providing a value for the added attribute to the feature data adapter. Armed with the transformations, the user can exercise the import tool. The import tool is run using the newer version of GeoWave.
Some initial design artifacts can be:
(1) The structure for the meta-data file. Is this Avro as well? If so, is it beast creating avro representation for data adapters and indices?
(2) The components of the import/ingest portion of the process. For example, specification of the destination feature and the transformation function to be applied. Transformation takes an object (e.g. Simple feature) from the stored format and transforms into another object (added/removed attributes, etc). Recall there are two types of transformation: data and meta-data (adapters, indices, etc.).
This is a parent issue for #684, #686, and #687
This capability might be part of geowave, or might be rolled as a separate project.
The base need is to provide the ability to export a geowave dataset (or subset of a dataset) to a single file, and import that same file back in to geowave.
The use case here is two-fold: