tspurway / hustle

A column oriented, embarrassingly distributed relational event database.
Other
240 stars 36 forks source link

Ability to save query results in DDFS #11

Closed tspurway closed 10 years ago

tspurway commented 10 years ago

We should be able to set 'save=tablename' and have the results of a query saved back out to DDFS. It would have to use the 'hustle_output_stream' much like 'nest=True'.

ncloudioj commented 10 years ago

This feature is partially available now, but it simply dumps a table from "select(..., nest=True)". Likewise, this table can be restored by Table.loads(dumps).

Additionally, we can store this serialization into DDFS or a local LMDB if necessary.

ncloudioj commented 10 years ago

Add an option "tag" for select(), which would be used as the table's name for the nested query. Moreover, it's up to the user to decide whether save the query result to the DDFS or not.

>> ret = select(department.id, department.name, where=department, nest=True, tag="depart_id_name")
>> ret.tag()

if user wants to save it, the tag name would be the same as the "tag" specified in the "select". Hustle takes care of the potential name conflicts. If the "tag" not specified, a random name will be given like the previous nested table name convension.