Closed tjcuddihy closed 4 years ago
Kedro aside there are a couple of things that you can do to ensure that your environments match from the terminal vs notebook. I am not familiar with the new pandas.CSVDataSet
as I am just now starting with my first 0.15.8
myself. We have struggled to get package installs correct through our notebooks, I make sure my team is all using their own environment, created from the terminal.
Note that the file browser on the left hand side of a SageMaker notebook is really mounted at ~/SageMaker
.
source activate python3
# may also be - conda activate python3
# unrelated on windows it was - activate python 3
cd ~/SageMaker/testing/notebooks # this appears to be where your project is
kedro install
For conda environments to show up in the notebook dropdown selection you will need ipykernel
installed. see docs
conda create -n testing python=3.6
pip install ipykernel
# I typically don't have to go this far, but installing ipykernel is recommended by the docs
ipykernel install --user
cd ~/SageMaker/testing/notebooks # this appears to be where your project is
kedro install
Do note that if you shut down your SageMaker notebook you will loose your packages and environments by default.
I also noticed that you have a difference between pandas. I have no idea if that changes things, but might be a simple fix.
Your second idea worked @WaylonWalker. I slightly adapted it as it didn't work straight up:
conda create --yes --name kedroenv python=3.6 ipykernel
source activate kedroenv
python -m ipykernel install --user --name kedroenv --display-name "Kedro py3.6"
cd ~/Sagemaker
kedro new # Name testing and example pipeline
cd testing/
kedro run
With a reasonable solution, I'll call this issue closed. Massive thank you @WaylonWalker for pointing me in the right direction.
Cheers, Tom
@tjcuddihy We're working with the AWS team to produce a knowledge document on using Kedro and Sagemaker. Would we be able to talk to you about how you used them together?
I'd be keen on learning more about how to make Sagemaker play nicely with kedro so I can still access everything I need from my kedro context. @yetudada I have an alpha version of a kedro plugin that plays nicely with sagemaker and allows you to run processing jobs.
@uwaisiqbal then you might be interested in this knowledge article that was just published on AWS: https://aws.amazon.com/blogs/opensource/using-kedro-pipelines-to-train-amazon-sagemaker-models/ 🚀
Description
The conda environment for python3.6 in notebooks cannot find
pandas.CSVDataSet
Context
I'm wanting to use sagemaker as my development environment. However, I cannot get kedro to run as expected in both the notebooks (for exploration and node development) and the terminal (for running pipelines).
Steps to Reproduce
Terminal success:
pip install kedro
in the terminalkedro new
2a.testing
for name 2b.y
for example projectcd testing; kedro run
=> Success!Notebook fail:
conda_python3
notebook intesting/notebooks/
!pip install kedro
in a notebookcontext.catalog.list()
Expected Result
The notebook should print:
Actual Result
Full trace.
Investigations so far
CSVLocalDataSet
Upon changing the yaml type for iris.csv from
pandas.CSVDataSet
toCSVLocalDataSet
, we get success on both the terminal and the notebook. However, this is not my desired outcome; The transition to usingpandas.CSVDataSet
makes it easier, for me at least, to use both S3 and local datasets.pip install kedro
output from notebookpip install kedro
output from terminalYour Environment
Include as many relevant details about the environment in which you experienced the bug:
kedro -V
python -V
PRETTY_NAME="Amazon Linux AMI 2018.03""
ID_LIKE="rhel fedora"
PRETTY_NAME="Amazon Linux AMI 2018.03""
ID_LIKE="rhel fedora"
pip freeze
arrow==0.15.5
asn1crypto==1.3.0
attrs==19.3.0
autovizwidget==0.12.9
awscli==1.18.27
azure-common==1.1.25
azure-storage-blob==1.5.0
azure-storage-common==1.4.2
azure-storage-file==1.4.0
azure-storage-queue==1.4.0
backcall==0.1.0
bcrypt==3.1.7
binaryornot==0.4.4
bleach==3.1.0
boto3==1.12.27
botocore==1.15.27
cached-property==1.5.1
cachetools==4.0.0
certifi==2019.11.28
cffi==1.14.0
chardet==3.0.4
click==7.1.1
colorama==0.4.3
cookiecutter==1.7.0
cryptography==2.8
decorator==4.4.2
defusedxml==0.6.0
docker==4.2.0
docker-compose==1.25.4
dockerpty==0.4.1
docopt==0.6.2
docutils==0.15.2
entrypoints==0.3
environment-kernels==1.1.1
fsspec==0.6.3
future==0.18.2
gitdb==4.0.2
GitPython==3.1.0
google-api-core==1.16.0
google-auth==1.12.0
google-auth-oauthlib==0.4.1
google-cloud-bigquery==1.24.0
google-cloud-core==1.3.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
hdijupyterutils==0.12.9
idna==2.9
importlib-metadata==1.5.0
ipykernel==5.1.4
ipython==7.13.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.16.0
Jinja2==2.11.1
jinja2-time==0.2.0
jmespath==0.9.4
json5==0.9.3
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.0.0
jupyter-console==6.1.0
jupyter-core==4.6.1
jupyterlab==1.2.7
jupyterlab-git==0.9.0
jupyterlab-server==1.0.7
kedro==0.15.8
MarkupSafe==1.1.1
mistune==0.8.4
mock==3.0.5
nb-conda==2.2.1
nb-conda-kernels==2.2.3
nbconvert==5.6.1
nbdime==2.0.0
nbexamples==0.0.0
nbformat==5.0.4
nbserverproxy==0.3.2
nose==1.3.7
notebook==5.7.8
numexpr==2.7.1
numpy==1.18.1
oauthlib==3.1.0
packaging==20.3
pandas==0.25.3
pandas-gbq==0.13.1
pandocfilters==1.4.2
paramiko==2.7.1
parso==0.6.2
pexpect==4.8.0
pickleshare==0.7.5
pid==3.0.0
pip-tools==4.5.1
plotly==4.5.4
poyo==0.5.0
prometheus-client==0.7.1
prompt-toolkit==3.0.3
protobuf==3.11.3
protobuf3-to-dict==0.1.5
psutil==5.7.0
psycopg2==2.8.4
ptyprocess==0.6.0
py4j==0.10.7
pyarrow==0.16.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pydata-google-auth==0.3.0
pygal==2.4.0
Pygments==2.6.1
pykerberos==1.1.14
PyNaCl==1.3.0
pyOpenSSL==19.1.0
pyparsing==2.4.6
pyrsistent==0.15.7
PySocks==1.7.1
pyspark==2.3.2
python-dateutil==2.8.1
python-json-logger==0.1.11
pytz==2019.3
PyYAML==5.3.1
pyzmq==18.1.1
qtconsole==4.7.1
QtPy==1.9.0
requests==2.23.0
requests-kerberos==0.12.0
requests-oauthlib==1.3.0
retrying==1.3.3
rsa==3.4.2
s3fs==0.4.0
s3transfer==0.3.3
sagemaker==1.51.4
sagemaker-experiments==0.1.10
sagemaker-nbi-agent==1.0
sagemaker-pyspark==1.2.8
scipy==1.4.1
Send2Trash==1.5.0
six==1.14.0
smdebug-rulesconfig==0.1.2
smmap==3.0.1
sparkmagic==0.15.0
SQLAlchemy==1.3.15
tables==3.5.2
terminado==0.8.3
testpath==0.4.4
texttable==1.6.2
toposort==1.5
tornado==6.0.4
traitlets==4.3.3
urllib3==1.22
wcwidth==0.1.8
webencodings==0.5.1
websocket-client==0.57.0
whichcraft==0.6.1
widgetsnbextension==3.5.1
xlrd==1.2.0
XlsxWriter==1.2.8
zipp==2.2.0
anaconda-client==1.6.14
anaconda-project==0.8.2
anyconfig==0.9.10
arrow==0.15.5
asn1crypto==0.24.0
astroid==1.6.3
astropy==3.0.2
attrs==18.1.0
Automat==0.3.0
autovizwidget==0.15.0
awscli==1.18.27
azure-common==1.1.25
azure-storage-blob==1.5.0
azure-storage-common==1.4.2
azure-storage-file==1.4.0
azure-storage-queue==1.4.0
Babel==2.5.3
backcall==0.1.0
backports.shutil-get-terminal-size==1.0.0
bcrypt==3.1.7
beautifulsoup4==4.6.0
binaryornot==0.4.4
bitarray==0.8.1
bkcharts==0.2
blaze==0.11.3
bleach==2.1.3
bokeh==1.0.4
boto==2.48.0
boto3==1.12.27
botocore==1.15.27
Bottleneck==1.2.1
cached-property==1.5.1
cachetools==4.0.0
certifi==2019.11.28
cffi==1.11.5
characteristic==14.3.0
chardet==3.0.4
click==6.7
cloudpickle==0.5.3
clyent==1.2.2
colorama==0.3.9
contextlib2==0.5.5
cookiecutter==1.7.0
cryptography==2.8
cycler==0.10.0
Cython==0.28.4
cytoolz==0.9.0.1
dask==0.17.5
datashape==0.5.4
decorator==4.3.0
defusedxml==0.6.0
distributed==1.21.8
docker==4.2.0
docker-compose==1.25.4
dockerpty==0.4.1
docopt==0.6.2
docutils==0.14
entrypoints==0.2.3
enum34==1.1.9
environment-kernels==1.1.1
et-xmlfile==1.0.1
fastcache==1.0.2
filelock==3.0.4
Flask==1.0.2
Flask-Cors==3.0.4
fsspec==0.7.1
future==0.18.2
gevent==1.3.0
glob2==0.6
gmpy2==2.0.8
google-api-core==1.16.0
google-auth==1.12.0
google-auth-oauthlib==0.4.1
google-cloud-bigquery==1.24.0
google-cloud-core==1.3.0
google-resumable-media==0.5.0
googleapis-common-protos==1.51.0
greenlet==0.4.13
h5py==2.8.0
hdijupyterutils==0.15.0
heapdict==1.0.0
html5lib==1.0.1
idna==2.6
imageio==2.3.0
imagesize==1.0.0
importlib-metadata==1.5.0
ipykernel==4.8.2
ipyparallel==6.2.2
ipython==6.4.0
ipython-genutils==0.2.0
ipywidgets==7.4.0
isort==4.3.4
itsdangerous==0.24
jdcal==1.4
jedi==0.12.0
Jinja2==2.10
jinja2-time==0.2.0
jmespath==0.9.4
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==5.2.0
jupyter-core==4.4.0
jupyterlab==0.32.1
jupyterlab-launcher==0.10.5
kedro==0.15.8
kiwisolver==1.0.1
lazy-object-proxy==1.3.1
llvmlite==0.23.1
locket==0.2.0
lxml==4.2.1
MarkupSafe==1.0
matplotlib==3.0.3
mccabe==0.6.1
mistune==0.8.3
mkl-fft==1.0.0
mkl-random==1.0.1
mock==4.0.1
more-itertools==4.1.0
mpmath==1.0.0
msgpack==0.6.0
msgpack-python==0.5.6
multipledispatch==0.5.0
nb-conda==2.2.1
nb-conda-kernels==2.2.2
nbconvert==5.4.1
nbformat==4.4.0
networkx==2.1
nltk==3.3
nose==1.3.7
notebook==5.5.0
numba==0.38.0
numexpr==2.6.5
numpy==1.14.3
numpydoc==0.8.0
oauthlib==3.1.0
odo==0.5.1
olefile==0.45.1
opencv-python==3.4.2.17
openpyxl==2.5.3
packaging==20.1
pandas==0.24.2
pandas-gbq==0.13.1
pandocfilters==1.4.2
paramiko==2.7.1
parso==0.2.0
partd==0.3.8
path.py==11.0.1
pathlib2==2.3.2
patsy==0.5.0
pep8==1.7.1
pexpect==4.5.0
pickleshare==0.7.4
Pillow==5.1.0
pip-tools==4.5.1
pkginfo==1.4.2
plotly==4.5.2
pluggy==0.6.0
ply==3.11
poyo==0.5.0
prompt-toolkit==1.0.15
protobuf==3.6.1
protobuf3-to-dict==0.1.5
psutil==5.4.5
psycopg2==2.7.5
ptyprocess==0.5.2
py==1.5.3
py4j==0.10.7
pyarrow==0.16.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.4.0
pycosat==0.6.3
pycparser==2.18
pycrypto==2.6.1
pycurl==7.43.0.1
pydata-google-auth==0.3.0
pyflakes==1.6.0
pygal==2.4.0
Pygments==2.2.0
pykerberos==1.2.1
pylint==1.8.4
PyNaCl==1.3.0
pyodbc==4.0.23
pyOpenSSL==18.0.0
pyparsing==2.2.0
PySocks==1.6.8
pyspark==2.3.2
pytest==3.5.1
pytest-arraydiff==0.2
pytest-astropy==0.3.0
pytest-doctestplus==0.1.3
pytest-openfiles==0.3.0
pytest-remotedata==0.2.1
python-dateutil==2.7.3
python-json-logger==0.1.11
pytz==2018.4
PyWavelets==0.5.2
PyYAML==5.3.1
pyzmq==17.0.0
QtAwesome==0.4.4
qtconsole==4.3.1
QtPy==1.4.1
requests==2.20.0
requests-kerberos==0.12.0
requests-oauthlib==1.3.0
retrying==1.3.3
rope==0.10.7
rsa==3.4.2
ruamel-yaml==0.15.35
s3fs==0.4.2
s3transfer==0.3.3
sagemaker==1.51.4
sagemaker-pyspark==1.2.8
scikit-image==0.13.1
scikit-learn==0.20.3
scipy==1.1.0
seaborn==0.8.1
Send2Trash==1.5.0
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.11.0
smdebug-rulesconfig==0.1.2
snowballstemmer==1.2.1
sortedcollections==0.6.1
sortedcontainers==1.5.10
sparkmagic==0.12.5
Sphinx==1.7.4
sphinxcontrib-websupport==1.0.1
spyder==3.2.8
SQLAlchemy==1.2.11
statsmodels==0.9.0
sympy==1.1.1
tables==3.5.2
TBB==0.1
tblib==1.3.2
terminado==0.8.1
testpath==0.3.1
texttable==1.6.2
toolz==0.9.0
toposort==1.5
tornado==5.0.2
traitlets==4.3.2
typing==3.6.4
unicodecsv==0.14.1
urllib3==1.23
wcwidth==0.1.7
webencodings==0.5.1
websocket-client==0.57.0
Werkzeug==0.14.1
whichcraft==0.6.1
widgetsnbextension==3.4.2
wrapt==1.10.11
xlrd==1.1.0
XlsxWriter==1.0.4
xlwt==1.3.0
zict==0.1.3
zipp==3.0.0