Open GoogleCodeExporter opened 9 years ago
Here's a first proposal. It is relatively vague, so questions on underspecified
details are appreciated!
It should be possible to export datasets or dataverses:
"export" "dataset" QualifiedName "using" AdapterName Configuration
exports
- the data of a dataset in a format that's compatible with the load
statement and
- the AQL statements needed to re-create the dataset and its indexes
"export" "dataverse" "using" AdapterName Configuration
exports
- all datasets of a dataverse and
- the AQL statements needed to re-create all datasets, indexes, and
functions
The adapters should use the equivalent configuration options as for loading and
external tables.
The "path" should point to a directory that contains separate files for
- the AQL statements (create.aql) and
- each dataset (<dataset-name>.<format-suffix>)
and an output-format should be specified for hdfs.
Initial questions:
1) Is the adm format good enough to be round-trippable?
2) Do we need a more efficient binary export format?
3) How do we handle the evolution of e.g. the adm-format?
Do we introduce a version identifier?
Original comment by westm...@gmail.com
on 5 Aug 2013 at 7:00
Here's a second proposal (extended with feedback from Mike):
It should be possible to export datasets or dataverses:
ExportArtifact ::= "sample"? "data"
| "schema"
| "configuration"
ExportArtifacts ::= ExportArtifact ("," ExportArtifact)*
"export" "dataset" QualifiedName ("with" ExportArtifacts)?
"using" AdapterName Configuration
should export a dataset and
"export" "dataverse" Identifier? ("with" ExportArtifacts)?
"using" AdapterName Configuration
should export all datasets of a dataverse. If no dataverse Identifier is given
the current default dataverse should be exported.
When exporting datasets it should be possible to export the full data, a sample
of the data, the AQL statements needed to re-create the dataset and its
indices, or the system configuration. When exporting a dataverse with the
schemas, the AQL statements needed to re-create the functions in the dataverse
should be exported as well. If no artifacts to export are given, the full data
should be exported.
The data of a dataset should be exported in a format that's compatible with the
load statement.
The adapters should use the equivalent configuration options as for loading and
external tables. The "path" should point to a directory that contains separate
files for
- the AQL statements (create.aql),
- each dataset (<dataset-name>.<format-suffix>),
- the cluster configuration, and
- the system configuration.
For export to hdfs an output format should be specified as well.
Question:
Should we export the configuration with a dataset/dataverse or should this be a
separate export? (The scope of the configuration seems to be a bit bigger than
the scope of the datasets/dataverses.)
Original comment by westm...@gmail.com
on 14 Aug 2013 at 8:43
Looks good - except where is the "path" that you are mentioning at the
end? (Is my memory mis-firing or is there a puzzle piece missing in the
syntax? :-))
Agreed about the config scope - but - seems okay like this, i.e., this
is a way to ask for that info to go along for the ride with an export of
something else. If you ONLY want to export the config, I guess you
could just export the current default DV with that as the only option as
the least-characters way to get that info - which seems like it works. :)
Original comment by dtab...@gmail.com
on 14 Aug 2013 at 7:33
The "path" is one of the parameters that the current adapter configurations
get. So it is in the syntax, but now in a way that is visible to the human eye.
I agree that it would have been really helpful to write that down :)
Original comment by westm...@gmail.com
on 15 Aug 2013 at 2:29
We should replace "with" with "including".
Original comment by westm...@gmail.com
on 16 Aug 2013 at 9:42
status: no work has happened here
Original comment by westm...@gmail.com
on 30 Jul 2014 at 4:32
Original issue reported on code.google.com by
vinay...@gmail.com
on 2 Aug 2013 at 7:07