uccross / skyhookdm-ceph-cls

Skyhook Data Management: Storage and management of tabular data in Ceph.
https://www.skyhookdm.com
GNU Lesser General Public License v2.1
13 stars 9 forks source link

Implement Rados reads in CLS #44

Open jlefevre opened 4 years ago

jlefevre commented 4 years ago

Based on feedback from RedHat, in lieu of a specialized copy_from() (our original approach) instead implement the ability to invoke rados read() from within the CLS context. This is more general, and has some support already within RedHat. This will be a significant coding effort, and it should be driven with good use-cases in mind.

Idea: from within a cls context, read from a remote object - with the ability to invoke a method directly on the remote object (e.g., any registered cls method such as transform data), returning the data output from the method to the caller object.

Considerations: this must be asynchronous (due to PG blocking), and must work for all object types to include replicated objects (more straightforward) as well as erasure coded objects (more complicated).

Motivation Skyhook (1) : Data originally written to an object in row format in one object is transposed to column format and written to another object. While normally this could be done outside Ceph in a client, here we prefer to avoid sending data back and forth between client/storage and rather do this directly within storage. Skyhook (2): A step in a scatter-gather of data from multiple src objects into a single target object, all within the storage layer.

General (1): De/Encrypt data from src to target object. General (2): Transcode data from src to target object, such downscaling resolution for video data.