I see Qarnot computing instances as potential candidates for data science use cases
For now I consider this example as the typical workflow to perform machine learning via your framework.
It is great, but requires code to be wrapped in a singled python file, and some docker prior knowledge to run the code in an embarrassingly-parallel fashion.
On the other hand, I would like to run some computation in a notebook environment, using Qarnot computing power.
For these - more interactive - workflows I think dask is a good candidate.
Even if dask is designed to build complex graphs there are some worfklows in datascience (and ML) that are embarassingly parallel, examples being :
regular data engineering / BI workflows on dataframes
Dask contributors already worked on dask-cloudprovider, a library to facilitate deployment on AWS and Azure. Providing a QarnotCluster would make the following workflow possible (I think)
>>> import dask.dataframe as dd
>>> cluster = QarnotCluster('<YOUR_API_TOKEN>')
>>> df = dd.read_csv('path-to-qarnot-bucket/*.csv')
.. some regular dask worfkflow ..
I know for now you rely a lot on docker, I am just asking if this is a possible direction in the future.
I see Qarnot computing instances as potential candidates for data science use cases
For now I consider this example as the typical workflow to perform machine learning via your framework. It is great, but requires code to be wrapped in a singled python file, and some docker prior knowledge to run the code in an embarrassingly-parallel fashion. On the other hand, I would like to run some computation in a notebook environment, using Qarnot computing power.
For these - more interactive - workflows I think dask is a good candidate.
Even if dask is designed to build complex graphs there are some worfklows in datascience (and ML) that are embarassingly parallel, examples being :
Dask contributors already worked on dask-cloudprovider, a library to facilitate deployment on AWS and Azure. Providing a QarnotCluster would make the following workflow possible (I think)
I know for now you rely a lot on docker, I am just asking if this is a possible direction in the future.
Cheers,