BOF: How to come up with computational best practices and drive their adoption

pvtodorov commented 5 years ago

Scientific software users come from very diverse educational backgrounds. I suspect that the range in individuals’ experience will become even broader as more scientists are pushed to learn how to code by necessity. As a result, the code being written in a research setting spans a huge range from complex software systems to single scripts that analyze data.

In this session I would like to discuss how we can help develop a set of computational best practices within our centers which help us:

Provide a standard overview of tools and infrastructure to new arrivals during on-boarding
Keep track of code and data
Facilitate sharing of code and data
Do not add undue burden
Provide easily accessible documentation that anyone can find
Scale to meet the needs of computational beginners and experts
Provide a standardized off-boarding procedure that ensures an individual’s code, data, and work is stored and available in a state that the could be understood and reused by others in their field

Many of these points are everyday practices for experienced software developers and many solutions exist in the context of the corporate world. However, the problem becomes far from trivial given the heterogeneity of code, tasks, and backgrounds we encounter in an academic setting.

I would like to connect and discuss not only how we can come up with best practices that are appropriate for our centers, but also how we can make sure users become invested and drive bottom-up adoption.

I want to share our experiences and discuss what has worked well in the past, what has been a struggle, and what we think could be improved going forward. Ultimately, I would like to see us develop and drive the adoption of computational best practices within our centers.

borisevichdi commented 5 years ago

Hi Peter! I totally agree with you and would like to support this session.

Most if not all of the problems, which we encounter today in the setting of scientific development, have already been solved for 5-20 years in software development industry. Software development and data analysis have never been easy tasks, and still, software giants do not re-write their code from scratch every time the CSO or CTO changes. So, we probably can learn something from them.

I'd be interested in joining, sharing my experience and listening to experience of others, with focus on:

On/off-boarding strategies,
Organized training of students and intervention into hiring processes,
Organized team-work and code/data sharing,
Including obligatory CD / CI,
"CusDev" and internal "product management" of our software in relation to biologists and MDs.

This topic also resonates with #5 , to some extent.

ukirik commented 5 years ago

Relevant and interesting topic on StackExchange: https://academia.stackexchange.com/q/17781/5674

pvtodorov commented 5 years ago

Thanks @borisevichdi @ukirik @pdworzynski for engaging, offering input, and some resources.

Looking at this I see there is significant overlap with some other session suggestions. I am very interested in discussing all of these in context of the human/incentive challenges, as @ukirik pointed out on StackExchange. (Although my recent view is a bit more optimistic than the commenters in that thread :) ) It appears training, incentives, and culture are the most significant points of friction. I'd like to discuss these with regards to our respective centers, if there are any efforts to push towards improvements, and how we can be part of that.

@borisevichdi I like the idea of organized trainings and interventions! I'm interested in your vision for continuous integration for data/bioinformatics as there are significant departures from software development.

pvtodorov commented 5 years ago

Session discussion and outcomes

The NNF Cluster Centers span a wide range of computational abilities and projects. While some centers develop software that is extensible, open-source, and meant to be reused, the work in other centers can be better described as a collection of bespoke analyses and scripts.
There is no one-size-fits all set of computational best practices that would work across all these conditions.
A weekly meeting where every computational scientist at a center talks about what they’re working on for a minute has been show to work, while more ambitious code review and hands-on support based meetings have had mixed results and adoption. Weekly meetings have sparked collaboration and communication in the past.
Despite the differences in work and backgrounds, there was significant overlap and interest in organizing more events to convene the computational scientists and participants in the unconference event.
We identified that all of our centers use Slack to communicate and there was support for making the NNF Computational Slack a more centralized space for discussion, support, and planning events.

Midnighter commented 5 years ago

I didn't get to join the session but I wanted to mention here the possibility of setting up a Jupyter notebook gallery which could be a way to share, advertise, and explore the more one-off kind of analysis. The gallery tool also allows a forking and merging workflow to either adapt work to your own or improve the exhibited notebook in a meaningful way.

I learnt about it in a meetup and apparently they used it at Novozymes with some success.

pvtodorov commented 5 years ago

@Midnighter sounds like a good idea. Is it something we can implement under the nnf-cbn org that's holding the unconference repo since we already have a lot of people here? Perhaps a separate repo for notebooks?

Midnighter commented 5 years ago

I haven't set up one myself yet but it runs a Ruby on Rails app as well as an Apache Solr instance. So it needs some server-side work which can't be hosted here on GitHub, I'm afraid. They do provide Docker images, though, so it should be straight forward to host almost anywhere. I think the harder part will be to advertise it at the centres and keep people engaged.

nnf-cbn / 2019-unconference

BOF: How to come up with computational best practices and drive their adoption #9