tinybio / dl-biodb

Repo for work surrounding downloadable biological databases
1 stars 0 forks source link

I had some thoughts about this #1

Open crashfrog opened 7 years ago

crashfrog commented 7 years ago

Nothing too congealed, as of yet, but I think a powerful tool for users of database-dependent tools, developers of those tools, and curators of those databases might have features as follows:

  1. management of downloaded/installed databases via a system daemon, akin perhaps to the Docker daemon, to promote a consistent interface into the management and retrieval of databases;

  2. the daemon able to report useful information about the version and change history of the database, and restore a database to an earlier version on demand so that earlier analyses can be fully repeated;

  3. API's and language bindings in Java, Python, Perl, and C (stuff like Protocol Buffers makes this somewhat easier) allow developers to add functionality to interrogate the daemon (if necessary) to resolve references to the necessary databases in their bioinformatics tools and pipelines;

  4. Distributed storage of databases via IPFS, perhaps, to prevent traffic and bandwidth bottlenecks across cluster environments and elsewhere, with the daemon perhaps able to make intelligent decisions about which databases are hosted locally and which are pulled from the distributed network on demand;

  5. a secure, content-based addressing system so that the same system can distribute open and closed databases, and that integrity of the data can be assured

Right now I imagine a system that's a bit like a mash-up of Git and the user experience of Docker, but for big databases instead of containers. Running on top of IPFS maybe to handle distribution.

crashfrog commented 7 years ago

I propose the working name 'beryl' for this project.