sgkit-dev / sgkit-publication

Sgkit publication repository
5 stars 5 forks source link

A draft of the story #50

Closed jeromekelleher closed 11 months ago

jeromekelleher commented 11 months ago

I've made a pass at writing the "story summary" here, which I think would be worth others taking a look at (line 97 to 170 in paper.tex).

Basically, we split the narrative into two bits. First, we tackle the big fundamental questions and show why our approach works well. Then, in the second part we showcase the functionality of sgkit via some case studies.

I think this would be a nice paper - what do you think @benjeffery @hammer @timothymillar @tomwhite ?

benjeffery commented 11 months ago

This is a good narrative, couple of thoughts:

I know it is a big part of the story of sgkit, but going into the details of each enabling library seems a bit too technical for the high-level summary that we want to draw people in with. Obviously getting into dask, zarr etc later on is needed, but "JIT-compiled Python working on distributed, rectangular, chunked arrays with metadata work great for genetics" is the main story and "doing that via standard open libraries enables inclusive development and interoperability" is another.

One paragraph that I thought might be missing is an explanation of why existing solutions are insufficient and motivate the need for sgkit?

jeromekelleher commented 11 months ago

Good points, thanks @benjeffery

I think the comparison with existing methods has to go into the section about the storage strategy - you just can't talk about this stuff without getting into the weeds.

benjeffery commented 11 months ago

Forgot to say - where you have "% FIXME "unit" is the wrong word" could be "segment" or "piece"?