qarnot / qarnot-sdk-python

Python SDK to use Qarnot's computing service
Apache License 2.0
13 stars 4 forks source link

QarnotCluster for Dask ? #6

Open remiadon opened 4 years ago

remiadon commented 4 years ago

I see Qarnot computing instances as potential candidates for data science use cases

For now I consider this example as the typical workflow to perform machine learning via your framework. It is great, but requires code to be wrapped in a singled python file, and some docker prior knowledge to run the code in an embarrassingly-parallel fashion. On the other hand, I would like to run some computation in a notebook environment, using Qarnot computing power.

For these - more interactive - workflows I think dask is a good candidate.

Even if dask is designed to build complex graphs there are some worfklows in datascience (and ML) that are embarassingly parallel, examples being :

Dask contributors already worked on dask-cloudprovider, a library to facilitate deployment on AWS and Azure. Providing a QarnotCluster would make the following workflow possible (I think)

>>> import dask.dataframe as dd
>>> cluster = QarnotCluster('<YOUR_API_TOKEN>')
>>> df = dd.read_csv('path-to-qarnot-bucket/*.csv')
.. some regular dask worfkflow .. 

I know for now you rely a lot on docker, I am just asking if this is a possible direction in the future.

Cheers,

ClemPi commented 4 years ago

Hello Rémi,

This is something we are working on. We are finalizing a sample with Dask and will keep you posted.

Regards