master -> node -> master protocol for file blocks and query results

jhihn commented 10 years ago

After uploading a file to master the user is able to set a few more patameters. The file table (ui) will show a file color status: Red = blocks missing from nodes Yellow = a complete set of blocks exists Green = the dupe (fill) factor is satisfied (file table)

Initially status wil be red. A "fix" button on the row in the table (ui) will generate and execute the work list: Identify what nodes need what blocks added as well as what nodes need Wray bkocks removed. Then it will execute those instructions, doing the removes first.

The transfer protocol to the nodes for this block operation should* be a general form post with: The create table statements for file and fileblock The file info (file table (db)). The block info (file block (db)) The row data copied straight from the master file. (File)

The node will execute. The create table statements for file record, file block record, and the data. The node will bulk insert the data. The node will create the databasenode tablw and populate with its information. The database file will be named as --.sqlite3 It is possible that for linux, there is a sym link where the guids are replaced with better names, provided there aren't duplicate file names on the node. This is for debugging purposes only and not anything the system uses.

After a result set is generated from a query, the result set is transfered to the master as a sqlite database, even if the row count is zero. The file will be named the same as the block file on the node, but with a hyphen and the query identifier (guid?) added on before the file extension.

The master can start collating results as soon as a result file is complete. The results are collated into a file named the same as the node results, but with "all" substituted for the block number. After a block result file has been integrated it is removed unless the master is in debug mode. The master must keep track of what blocks are in the file. The resukt us not complete until all blocks are in.

Missing blocks are not handled for now. Howevee when they are, the server will just reissue the query. When this happens the occurance is logged.

Comments?

jhihn commented 10 years ago

The gitgub mangled the file specs. I'm goingto modify them slightly too. The file specs are: Fileuid-blocknumber-nodeid.sqlite3 Fileuid-blocknumber-nodeid-queryid.sqlite3 Fileuid-ALL-queryid.sqlite3

luggage66 commented 10 years ago

Sounds like someone secretly wants to rewrite hadoop.

jhihn commented 10 years ago

No, not secretly. On Nov 3, 2013 10:27 AM, "Donald Mull Jr." notifications@github.com wrote:

Sounds like someone secretly wants to rewrite hadoop.

— Reply to this email directly or view it on GitHubhttps://github.com/luggage66/thermite/issues/5#issuecomment-27646580 .

luggage66 / thermite

master -> node -> master protocol for file blocks and query results #5