praeducer commented 2 years ago

Help to establish best practice software patterns for computation authors. Goal: User documentation of features and computation development will be usefull. Want to include exploration of how things are used + areas to test.

Sub-Tasks

[ ] How to containerize a computation in Docker based on standard TReNDS Center best practices and templates
[ ] Authors become better Python and functional programmers
[ ] Authors learn to manage their environment, packages, and code better
[ ] Authors know how to create good code repos
[ ] How to document computations well
[ ] Establish code styling standards
[ ] Create a loose framework for testing code - a simple way to do this. Have some guidelines for just testing code as a proof of concept without simulator - this can help drastically with debugging, and allow separate debugging of bugs in python code alone, vs issues running in docker, creating proper inputspecs etc
[ ] How to containerize a computation in Singularity

Reference Code and Documentation

https://github.com/trendscenter/coinstac-enigma-sans/tree/pans

https://github.com/trendscenter/coinstac/tree/master/packages/coinstac-utilities/coinstac-python/coinstac

hvgazula commented 2 years ago

💯 would love to see this.

praeducer commented 2 years ago

Awesome @hvgazula! It's officially on the product roadmap now. ;D

Any thoughts on more specific asks for whoever tackles this Issue? Any additional constraints or guidance is welcome.

praeducer commented 2 years ago

draft per @spanta28: Learn how to package your code, build docker image, use existing COINSTAC python libraries to work with COINSTAC framework, including an example: https://github.com/trendscenter/coinstac-computation Develop/contribute to the algorithms, check out our Distributed Neural Network implementation on COINSTAC, already integrated with the UI: https://github.com/trendscenter/dinunet_implementations_gpu

praeducer commented 2 years ago

Rules of Thumb

By @bbradt

First rule of thumb is 1 repository per computation - in the past I had multiple repositories for shared steps common between computations (group ICA for example). I think this just added to confusion, so I am just going to maintain 1 repo per computation for simplicity and clarity. Aashis did a similar thing for dinunet, so it's been a common problem.
second rule of thumb is adhering to pep8 style standards. We can talk about another standard if that's preferred, but this is what BrainForge is using, so we should probably just stay consistent. This is easy to maintain in IDEs like VSCode, where you can install autopep8 directly into your python environment and automatically make sure things keep up to standards
third rule of thumb is distinct and full documentation. Pep8 already requires docstrings in code and gives some guidelines for structuring docstrings, but we also need to maintain up to date READMEs with full running instructions, tested data, etc.

praeducer commented 2 years ago

This is a gold mine of learning resources and best practices for data management and open science! https://www.repro4everyone.org/resources

praeducer commented 1 year ago

Ideas from session held today:

[ ] Standardize args, args library, args schema
[ ] Pep8 is another good standard. Style standards.
[ ] Enforce BIDS
[ ] Docs template

Need to enforce standards for how to interface with COINSTAC as well as how inputs to computations are structured. Inputspec is a start to this. Keys are bespoke per computation. Are these intuitive or documented?

Inputspec was made after pipeline was built. Simulator was a second class citizen. Additional complexity was put on computation authors.

Need it easy to load in data without having to re-write the wheel each time. Similar to libs like nifty and ssl. Could be more libs to integrate here to make things easier like for pybids. On Brainforge if we could enforce BIDS. This is a great neuroimaging standard. https://bids-standard.github.io/pybids/. OpenNEuro has some good examples.

Still need some standardization around covariates. Anything in machine learning ecosystem or libraries that could help standardize parameters or covariates better?

BIDS can help us standardize towards data sets too. It is becoming more popular and also lots of apps and libs built around it. What do we do with data that is not BIDS data formats?

First solve for BIDS and neuroimaging first. Focus on strong core features.

Can we also standardize around particular data sets? So get really good at analyzing and processing one data set, then make it more flexible around other data sets. In general, simplify our work by sticking to things like similar data formats and structures.

Fixed directory structures and data structures whenever possible.

Are there tools like DVC we can leverage here? https://dvc.org

praeducer commented 1 year ago

@bbradt We'd like you to own this task. Do you have any questions to make this more clear of a thing to do?

trendscenter / coinstac

Document best practices and design patterns for computation authors #1334

Sub-Tasks

Reference Code and Documentation

Rules of Thumb