nils-braun / dask-sql-k8s-deployment

Example for deploying dask-sql (with Dask and Apache Hue) on k8s
MIT License
2 stars 0 forks source link

dask-sql pod not running #3

Open romainr opened 3 years ago

romainr commented 3 years ago

Just wondering if you saw the same issue before:

NAME                                  READY   STATUS    RESTARTS   AGE
dask-sql-fd948f7d9-pxwdm              0/1     Running   1          13m
+ '[' '' ']'
+ '[' -e /opt/app/environment.yml ']'
no environment.yml
EXTRA_CONDA_PACKAGES environment variable found.  Installing.
+ echo 'no environment.yml'
+ '[' 's3fs=0.5.1 dask-xgboost=0.1.11 xgboost=0.90 aiobotocore=1.1.2 botocore=1.17.44 -c conda-forge' ']'
+ echo 'EXTRA_CONDA_PACKAGES environment variable found.  Installing.'
+ /opt/conda/bin/conda install -y s3fs=0.5.1 dask-xgboost=0.1.11 xgboost=0.90 aiobotocore=1.1.2 botocore=1.17.44 -c conda-forge
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
nils-braun commented 3 years ago

Has it been killed or is it just taking very long? Because the conda install will unfortunately take some time...

romainr commented 3 years ago

It is pretty long, 10+min (and it gets restarted for this reason or not), will keep it running for the night and see.

nils-braun commented 3 years ago

Ah I see. Well, could you check the reason it was restarted? Maybe with 'kubectl describe'? Because if it really takes so long, you might want to increase the loveliness check threshold. Other reasons could also be, that the pod uses up too much memory (in this case you would need to increase the memory limit). I should probably better make them an argument for the helm chart...

romainr commented 3 years ago

Ha, it worked this time! Maybe it was slow (and good idea, would bump the probe if it comes back)

image

https://demo.gethue.com/hue/editor/?type=10

I see the table is not detected (probably the hyphen -) in the right assist. Also will definitely need the Task Server at some point to avoid the timeout on longer execution.

Good progress!

For the record, logs trace:

+ '[' '' ']'
+ '[' -e /opt/app/environment.yml ']'
+ echo 'no environment.yml'
+ '[' 's3fs=0.5.1 dask-xgboost=0.1.11 xgboost=0.90 aiobotocore=1.1.2 botocore=1.17.44 -c conda-forge' ']'
+ echo 'EXTRA_CONDA_PACKAGES environment variable found.  Installing.'
+ /opt/conda/bin/conda install -y s3fs=0.5.1 dask-xgboost=0.1.11 xgboost=0.90 aiobotocore=1.1.2 botocore=1.17.44 -c conda-forge
no environment.yml
EXTRA_CONDA_PACKAGES environment variable found.  Installing.
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs:
    - aiobotocore=1.1.2
    - botocore=1.17.44
    - dask-xgboost=0.1.11
    - s3fs=0.5.1
    - xgboost=0.90

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _py-xgboost-mutex-2.0      |            cpu_0           8 KB  conda-forge
    aiobotocore-1.1.2          |             py_0          39 KB  conda-forge
    aiohttp-3.7.3              |   py38h25fe258_0         619 KB  conda-forge
...
nils-braun commented 3 years ago

Concerning the hue installation: good point. I think it would make sense to move to the official helm chart of hue for better feature support. We might want to do this once the dask-sql integration in hue is released. What do you think?

(PS: I have added a small configuration option for the initial delay in the check, see #4)

romainr commented 3 years ago

:+1: This would be simpler in the long run to decouple indeed. It should already work as of today from the master branch which is the source of the latest docker image until the 4.9 release in Jan/Feb 2021.

Basically would normally just need to fill-in the Dask SQL interpreter (as type dasksql type) in the values.