paygaphack / mentors-repo

A place for pre-June GenderPayGapHack Mentor prep!! :gem::sparkles:
MIT License
3 stars 10 forks source link

Computational environment for gender paygap hack #7

Open Winterflower opened 6 years ago

Winterflower commented 6 years ago

hey lovely people! Just wanted to kick off a discussion about the environment that attendees should (could) use for working on the gender paygap hack challenges. I wrote up a comparison of the different choices we have here. By far the easiest and most reliable of these choices is to use Google's Colaboratory, which provides free compute and a Jupyter-like environment for anyone with a Google account. The resulting notebooks can also easily be saved to Github. Have we decided on our preferred method for distributing data to attendees? If we don't need any custom databases/other data stores that need a custom environment, should we go with Colaboratory?

MarckK commented 6 years ago

Thank you Camilla for your write up and for moving the conversation forward on this. Colaboratory looks like a great choice for us. In terms of it restricting users to Python -- others might have stronger objections about this, but I'm OK with it in that those who know how to write Scala or R are likely to be proficient in Python, so although using Python might not be a first choice, it wouldn't be a blocker for them.

MarckK commented 6 years ago

My understanding is our preferred choice is to put data sets up on GitHub in csv or json form. Therefore we don't need custom databases. Our emphasis is on how to make the workflow on the day as smooth and accessible as possible.

Thank you, Camilla, for researching potential computational environments for the data hack and their pros and cons!

SandrineP commented 6 years ago

Thanks a lot Camilla! Re. data storage, we won't have SQL/NoSQL databases but only csv. We were discussing storing it on dropbox (as there will be one dataset that we can't publicly share + for size of storage reason) but we can probably use GoogleDrive as well. Re. environment, Colaboratory looks like a good choice. Not everyone will use python, but we don't really have the time capacity to create a custom JupyterHub/Binderhub deployment... I'll give it a try this afternoon!