Open DingoEatingFuzz opened 5 years ago
My only concern with SageMaker is that it seems to be geared towards a machine learning workflow / mindset. I think that's a great strategic goal - TensorFlow is eating the world - but I'm not sure how well that fits the tactical situation. It's definitely accessible from R / RStudio. so it wouldn't lock R programmers out.
Holding off until needs for sagemaker are clarified.
@danieldn @DingoEatingFuzz did we decide to use sagemaker or not for people needing cloud access? The other type of cloud resources that we recommended was Google Colaboratory notebooks which are free.
There is an open PR to introduce the sagemaker infrastructure: https://github.com/hackoregon/hackoregon-aws-infrastructure/pull/61
This will make it easy for us to provision notebook instances, but I still want to be conservative with when we do that, since it can be costly.
If someone is working with private data or large datasets and cannot do the work locally, they should request a notebook instance.
For large datasets, improved collaboration, and link sharing, data science should be done ✨ In The Cloud ✨
Since most of our data science is done through the Python ecosystem, Jupyter Notebooks is the most obvious technology choice. R and RStudio comes in as a close second.
Ideally, we self-host this so we can take advantage of lower-latency dc locality and such, compared to open tools where data would have to be transferred over arbitrary distance and unknown network conditions.
Ideal solution
Other tools to look at
TODO