s312569 / clj-biosequence

A Clojure library designed to make the manipulation of biological sequence data easier.
76 stars 11 forks source link

Phylogenetics within scope? #27

Open metasoarous opened 9 years ago

metasoarous commented 9 years ago

I realize that this is "clj-biosequence", but a slightly more general scope which included some phylogenetics functionality would be awesome. I've started working on some newick parsing and such myself, and would love to contribute it to this project. In fact, if you wanted good nexus support (#26), you would want this anyway since nexus files can contain trees.

s312569 commented 9 years ago

Good question - the reviewers on the paper mentioned phylogenetic support as well. So yes please do contribute!

I've also been thinking about whether a monolithic library is a good idea versus separate smaller libraries. I always like a small library but having one place to go for useful biological libraries, documentation etc is also an attractive idea. Utimately maybe a biological themed git repository for bioinformatics libraries would be a good idea?

metasoarous commented 9 years ago

Great! Glad you're open to me contributing some phylo code!

As for scope, I agree; Small, modular, composable libraries are nice. On the other hand though, Biopython and Bioperl both set a precedent for being fairly large, and those projects have done very well for themselves. And of course, there are also pragmatic questions, such "if we separated out phylogenetics, where would the general purpose nexus reading live?"

Taking a step back, what I'd really like to see is something the Clojure community can rally behind as the standard API for computational biology and bioinformatics in Clojure. I'm quite inspired in this vision by what core.matrix has accomplished along these lines for array/matrix programming. From it's README:

Key goals of core.matrix:

  • Provide a clear, standard API / abstraction for matrix and array programming in Clojure
  • Enable pluggable support for different underlying matrix library implementations
  • Provide a foundation library for other projects (e.g. Incanter)

Aiming for a similar set of goals with clj-biosequence would mean the rest of the community could build on the unified set of abstractions, increasing global compatibility, while still encouraging diversity and composability. For example, if someone had a clever idea for some custom fasta parser with certain characteristics, they wouldn't have to design an entire new, incompatible API; They could simply implement the protocols of clj-biosequence, making the new library instantly compatible with existing code. This is something Clojure's solution to the expression problem uniquely enables, and I think clj-biosequence has the potential of fulfilling this.

I certainly wouldn't want to see clj-biosequence become overbloated in this vision, but I don't think it has to be. Things that fall out of the scope of the core API, such as higher level features or alternative/customized implementations of things can live separately. But you're right that it would be nice for these things to have a common home. I think Github Organizations would be a good solution to this, so separate libraries can have distinct repositories, but live under a "shared roof".

Again, stoked to help out here, excited to see this library mature, and look forward to your thoughts!

s312569 commented 9 years ago

Yep - exactly what I was thinking! I've got a set of protocols defined in the clj-biosequence.core so that as new formats get added all they need to do is implement the protocols and they are instantly compatible with everything else. Whether the protocol structure I've constructed is a good one is open to comment :) So I definitely think we are on the same page and getting input from others would be a good way of refining the core interface.

metasoarous commented 9 years ago

Wonderful! I haven't had a chance to look too deeply at the code yet, but from what I've seen so far, it definitely looks like it's moving in the right direction, which is great to see :-) And yes, it will great to see what time and usage do towards this end.

metasoarous commented 9 years ago

Hey, sorry it's taken me a while to get this finished. I actually got most of the way there a few weeks back, and then just got too busy to finish it up. I was wondering though if you'd like to have a more open ended discussion about some architectural things. In some other collaborations, I've found that gitter is a really great tool for this. You can pretty easily start a chat room associated with a particular github repository, and it'll even give you a badge to install in your README. Just a thought...