Open jreades opened 7 years ago
Should there be a global data directory (to reuse datasets across atoms) or is data replicated potentially in each atom?
Ensure that someone isn't left hanging if they're not online? (can embed simple data sets in GitHub and use PySAL examples too for more advanced tutorials).
Need to remember to copy data used into final 'compiled' notebook directory. But then need to think about size of the data sets as well.
Was wondering if perhaps the best way is to make use of the data sets distributed with PySAL? Or we could have in the README for each atom: "Tutorials in this folder should use one of the following two data sets: A, B." We don't need to allow everyone to bring their own data to the party as that's not the purpose...
Enforce data set consistency within atoms -- So we should use one data set for all of the atoms in one group and then, say, a different data set for the ML.