nbren12 / dask.targeted

Enabling on-disk and other persistence mechanisms in dask
2 stars 1 forks source link

Implement multiple dispatch API for targeted #3

Open nbren12 opened 6 years ago

nbren12 commented 6 years ago

The main issue here is that targeted evaluates during graph construction and could access what kind of dask object it is passed. However, if it is passed a Delayed object, it has no idea of knowing what the output type is.

On the other hand, writer is executed on task execution, so it should now what the output type is. Unfortunately, reader will not know what the output type is since it will only have access to the target object. The best solution then is probably to create a Reader/Writer object which knows how to write a given type of data to a variety of luigi targets. Then, I can dispatch targeted based on the following rules

  1. targeted(dask_collection, target) is routed to a default argument for the given dask collection.
  2. targeted(delayed, target, reader_writer). Delayed objects need to explicitly specify a reader_writer.
nbren12 commented 6 years ago

To make this work, I will probably need a way to specify a unique name for dask arrays based on the target. Then, for each target I can create a multiple dispatched function for generating a unique name.

nbren12 commented 6 years ago

If I make this operate on the level of keys rather than collections, it should work better.