Open russelljjarvis opened 6 years ago
With regards to the issue of where to store data that this notebook generates (GH is for source code not data). I recently read: https://www.nature.com/articles/sdata201618 Direct quote: "Apparently in response to this, we see the emergence of numerous general-purpose data repositories" ... to open globally-scoped repositories such as Dataverse7, FigShare (http://figshare.com), Dryad8, Mendeley Data (https://data.mendeley.com/), Zenodo (http://zenodo.org/), DataHub (http://datahub.io), DANS (http://www.dans.knaw.nl/), and EUDat9."
Dataverse and Datahub both sound like they might be API driven.
I arrived at the OSF for data storage, as I think it is created by similar developers behind RRIDs and scicrunch etc. It also has an API.
Also relating to this notebook. I have recently learned the dask bag idiom, which can be used on collections (including pandas type collections). I applied it here in a different context but I believe it will be better for us in the long run.
import dask.bag as db
import pandas as pd
sl = [ (i, val) for i, val in enumerate(t_analysis.searchList) ]
b = db.from_sequence(sl, npartitions=8)
obj_arrs = list(db.map(t_analysis.iter_over,b).compute())
df = pd.DataFrame(data=obj_arrs)
df
Although the sciunit score objects are not stored directly inside the data transport container objects, because of dask bags, a dask/pandas data frame of the scores are retrieved from the workers, and aggregated on the controller.
I made this graphically improved ipython notebook designed to illustrate the efficiency of the neuronunit optimization: https://github.com/russelljjarvis/neuronunit/blob/results/neuronunit/unit_test/test_ga_versus_grid.ipynb I hope it is not too late for the next release.