Write instructions on how to download datasets and how to install Jupyter

okfn-brasil / serenata-notebooks

Notebooks from Operação Serenata de Amor | ** Este repositório não recebe atualizações frequentes **

MIT License

53 stars 12 forks source link

Write instructions on how to download datasets and how to install Jupyter #7

Open rodolfo-viana opened 5 years ago

rodolfo-viana commented 5 years ago

What is the problem?

As this repo is designed to assist either collaborators with experience in Python or newcomers, it is advised to have a version of Serenata's datasets which do not need Docker.

How can this be addressed?

Create a script to download, clean and translate datasets without Docker. Perhaps we could use Serenata's old version of doing it -- the version in which Docker was not required.

Who could help with this issue?

I can develop this, bit I believe it is already done if we get that old version script.

Labels

Enhancement.

cuducos commented 5 years ago

I think this would be quite simple. Usually beginner struggle to get Jupyter (we could recommend miniconda, or anaconda to make this easier). The part about the datasets is straightforward though:

We need a requirements.txt with serenata-toolbox>=15.1.0
Then $ pip install -r requirements.txt
And finally $ serenata-toolbox will download the files to the data/ directory

rodolfo-viana commented 5 years ago

You're right, @cuducos. The part about datasets is straightforward. I will just write some .md file do explain for newcomers how it works.

About anaconda or miniconda, it is easier, for sure, but it comes with a lot of useless libraries. And there is some issues when using conda and trying to upgrade Jupyter and other libraries with pip, for instance. (For my own experience I found it troublesome.) For educational purposes, I believe the best thing is to write instructions on how to install Python, Jupyter and libraries instead of taking the shortcut of Anaconda.

I can write these instructions. :)

rodolfo-viana commented 5 years ago

I wrote this: #8 But it is not complete, I guess. I mean, I go through the process from the start, but do not go further. That's because I believe we should have in mind we will probably get a lot of newcomers from different backgrounds. So this one is the first one, to get things running. Then I will write instructions on how to use git and share with us. And then, how to download fresh datasets. Is it ok for you guys? Suggestions are welcome. :)

jtemporal commented 5 years ago

We could use this setup script as basis for doing the downloads.

rodolfo-viana commented 5 years ago

I guess we could specially for the latest files in SdA server. It would help a lot btw. But to extract up-to-date files from Câmara and other servers I believe I must rewrite some lines of the original code. A couple of days ago I tried some of serenata-toolbox scripts and found some issues dealing with it on Windows. So I think I will add some try/except lines to the code to get it running smoothly on Windows and upload the new version here. What do you guys think?

cuducos commented 5 years ago

I think the way to handle it is to report this errors as issues at serenata-toolbox repo and then we work on than over there ; )