scikit-bio / scikit-bio

scikit-bio: a community-driven Python library for bioinformatics, providing versatile data structures, algorithms and educational resources.
https://scikit.bio
BSD 3-Clause "New" or "Revised" License
877 stars 267 forks source link

List of algorithms that would be awesome to have #677

Open wasade opened 9 years ago

wasade commented 9 years ago

Following a conversation with @rob-knight @gregcaporaso @antgonza and @cuttlefishh, we came up with a preliminary list of algorithms that would be fantastic to include. Education and optimized versions of all of these algorithms would be great, though we want to wrap existing tools/algorithms where it makes sense for the optimized versions. Some of these algorithms will necessitate additional data structures of course. Please feel free to expand on this list -- this is not exhaustive, and there are likely more algorithms out there that would be great to include.

Tree algorithms:

Alignment:

RNA:

Assembly:

Search:

Whole genome phylogeny:

HGT Detection:

Functional annotation:

3D structure:

mortonjt commented 9 years ago

This group seems to have a nice collection of HMM related libraries http://www.cs.au.dk/~asand/?page_id=152

If this is considered a good idea, I wouldn't mind building wrappers for these 3 libraries.

wasade commented 9 years ago

For performance HMMs, what about wrapping HMMer? On Sep 29, 2014 10:27 PM, "mortonjt" notifications@github.com wrote:

This group seems to have a nice collection of HMM related libraries http://www.cs.au.dk/~asand/?page_id=152

If this is considered a good idea, I wouldn't mind building cython wrappers for these 3 libraries.

— Reply to this email directly or view it on GitHub https://github.com/biocore/scikit-bio/issues/677#issuecomment-57265795.

mortonjt commented 9 years ago

From what I understand, HMMer is the best Profile HMM software package out there. The packages I found are better suited for designing other HMMs. These sorts of HMM are more appropriate for solving problems such as identifying C-G rich regions. Plus they have really slick optimizations.

rob-knight commented 9 years ago

Do they solve problems we find ourselves wanting to solve in practice…?

On Sep 29, 2014, at 10:56 PM, mortonjt notifications@github.com<mailto:notifications@github.com> wrote:

From what I understand, HMMer is the best Profile HMM software package out there. The packages I found are better suited for designing other HMMs. These sorts of HMM are more appropriate for solving problems such as identifying C-G rich regions. Plus they have really slick optimizations.

— Reply to this email directly or view it on GitHubhttps://github.com/biocore/scikit-bio/issues/677#issuecomment-57267202.

mortonjt commented 9 years ago

Not sure. I have used these kinds of HMMs in past research though.

I do agree with @wasade . If any HMM wrappers need to be implemented, HMMer should probably be wrapped first. I do have some HMMer wrappers written up, so I wouldn't mind tidying them up and committing them.

mortonjt commented 9 years ago

I'm going to throw this idea out there.

Graphlan can create some really awesome phylogenetic tree plots - but it isn't very intuitive to use.

I think it would be really nice if there was a way to incorporate these plotting capabilities in scikit-bio. This could ease the process of incorporating heatmaps, bar plots, taxonomic labels on phylo trees without having to jump over file formatting hurdles.

rob-knight commented 9 years ago

I think that would be awesome -- although the question is if we should do that or try for solution that also works in web context interactively? Depends on effort level.

On Oct 3, 2014, at 1:06 PM, "mortonjt" notifications@github.com<mailto:notifications@github.com> wrote:

I'm going to throw this idea out there.

Graphlan can create some really awesome phylogenetic tree plots - but it isn't very intuitive to use.

I think it would be really nice if there was a way to incorporate these plotting capabilities in scikit-bio. This could ease the process of incorporating heatmaps, bar plots, taxonomic labels on phylo trees without having to jump over file formatting hurdles.

— Reply to this email directly or view it on GitHubhttps://github.com/biocore/scikit-bio/issues/677#issuecomment-57850114.

gregcaporaso commented 9 years ago

I also think that sounds good - we definitely need a solution for tree viewing, and it would be great if this could be something that could work in a web browser. @mortonjt, would you be able to post your ideas on this under #531, and we can get some discussion going there?

Another option might be to use Archeopteryx, which there are applets for, but that might get into a bunch of other problems we don't want to get into (note that in that case, we wouldn't introduce Archeopteryx as a dependency, but maybe just have an easy way to load a TreeNode into an Archeopteryx applet.

dansondergaard commented 9 years ago

Maybe zipHMM (https://code.google.com/p/ziphmm/) would be interesting for the Forward algorithm. A naïve algorithm could be used for small strings/models and zipHMM could then kick in for larger strings.

mortonjt commented 9 years ago

I'm going to throw out another idea that I think would be extremely useful.

Permutation tests tend to be really darn slow and most of the implementations out there don't take full advantage of the hardware out there. I think it would be really cool to have permutation tests using SIMD, OpenCl or CUDA. Perhaps we can adopt an approach similar to this software

rob-knight commented 9 years ago

That’s a great idea if it can be matched to currently available hardware. Definitely worth investigating!

On Nov 20, 2014, at 5:06 PM, mortonjt notifications@github.com<mailto:notifications@github.com> wrote:

I'm going to throw out another idea that I think would be extremely useful.

Permutation tests tend to be really darn slow and most of the implementations out there don't take full advantage of the hardware out there. I think it would be really cool to have permutation tests using SIMD, OpenCl or CUDA. Perhaps we can adopt an approach similar to this softwarehttps://github.com/wanderine/BROCCOLI

— Reply to this email directly or view it on GitHubhttps://github.com/biocore/scikit-bio/issues/677#issuecomment-63903723.