Algorithms and Ops - Githubissues

imglib / imglib2

A generic next-generation Java library for image processing

http://imglib2.net/

Other

302 stars 91 forks source link

Algorithms and Ops #40

Open dietzc opened 11 years ago

dietzc commented 11 years ago

Hi all,

we would like to draw your attention to a probably well-known issue regarding the implementation of general algorithms. And by now we feel the urge to discuss it (you too, hopefully):

The issues we want to address are:

there are a couple of, more or less, redundant implementations of algorithms (AbstractRegionGrowing, ConnectedComponentAnalysis, Morphological operations, ...) of varying quality (honestly, our implementations in the ops-packages are certainly not the best among them)
many implementations use different concepts and are part of different frameworks (e.g. Operations, Algorithms or nothing) what complicates, and some time limits, their use and compatibility

As a first starting point, we propose the following:

we introduce a "imglib2-algorithms-sandbox" where one can put code which he wants to make visible/usable to others but didn't have time for a super-clean implementation yet, they can also be removed without being marked as deprecated. We think this package is important because many of these algorithms can be good starting point for more sophisticated implementations.
we should agree on one package (e.g. imglib2-ops) what should facilitate the interfaces definitions and usage (e.g. concatenation of algorithms, applying pixel-wise operation assignments on images or rois, etc.) of algorithms in general. addtionally this package may also contain very basic operation like add substract etc)
algorithm live in another package, e.g. imglib2-algorithms and have to comply with certain conventions that need to be elaborated and algorithms can only be added via pull-requests.

The conventions for stable algorithms in imglib2-algorithms can be something like this:

no other algorithm doing the same should exist (if yes, this algorithm should be extended instead of adding a similar one)
each algorithm must inherit from the same interface such that it can be embedded e.g. in the operation framework and should follow a certain coding convention (e.g. the obligation to provide static accessor methods, like in Gauss3)
the implementation should comply with a set of fixed coding guidelines (e.g. avoid the use of Img itself instead use IterableInterval or RandomAccessibleIntervals etc, whenever possible, use Views to create Iterables over RandomAccessibles, Formatting conventions etc..)
Multi-threaded algorithms must provide a method which takes an ExecutorService, which is e.g. important for application which maintain their own ThreadPool.
...

let's start this discussion!

Martin & Christian

Update: Added GoogleDoc document https://docs.google.com/document/d/1vENBDKboBWf9q29LHNtY0aizPY6mUw0pa0-q99JsscQ/edit#

To: @LeeKamentsky @MichaelZinsmaier @Squareys @angrauma @StephanPreibisch @axtimwalde @tpietzsch @tinevez @ctrueden @dscho @bdezonia @hinerm @hornm

dietzc commented 11 years ago

One more interesting idea is, that this clean algorithms package has a dependency on Scijava-Common such that we can add some annotations to the algorithm (and to the conventions?!) and run them as imagej2 macros?

StephanPreibisch commented 11 years ago

Hi Christian, I like this initiative a lot. Sandbox, great idea - and also the split between ops and actual implementations of algorithms. We just have to defines the rules though. Maybe a google-doc would be a great idea so that we can all change and discuss it.

I object a little to the "no other algorithm doing the same should exist" ... I think there are examples where it makes sense to have it, e.g. Gauss3 vs FourierConvolution. They basically do the same but there is no point in extending one with another. But maybe you mean it much more strict, saying there should not be Gauss1, Gauss2, Gauss3? Still, maybe I want to make Gaussian convolution using CUDA or some other approximation, which does the same but has much more dependencies or I do not know what. Maybe it is better to say they have to be in the same package. What do you think? I feel we at least have to be more precise what it means "doing the same thing".

But again, in general I like it!

Ciao ciao, Stephan

dietzc commented 11 years ago

Sorry for not being precise ;-) Of course there may co-exist several implementations of the same algorithms. For Example a user may chose which implementation to use (GPU vs. Direct Convolution vs. Fourier-Space). Ideally all of these implementation rely on the same interface, such that one can call e.g. Gauss.gauss(implementationtype). Anyway there shouldn't exist two different versions of the same implementation (e.g. Gauss1 vs. Gauss2 vs. Gauss3).

tinevez commented 11 years ago

Hi Christian

we would like to draw your attention to a probably well-known issue regarding the implementation of general algorithms. And by now we feel the urge to discuss it (you too, hopefully):

I am really glad this happens. Otherwise imglib2 is to become or dead or uncontrolled (zombie) API :)

The issues we want to address are:

-

there are a couple of, more or less, redundant implementations of algorithms (AbstractRegionGrowing, ConnectedComponentAnalysis, Morphological operations, ...) of varying quality (honestly, our implementations in the ops-packages are certainly not the best among them)

many implementations use different concepts and are part of different frameworks (e.g. Operations, Algorithms or nothing) what complicates, and some time limits, their use and compatibility

As a first starting point, we propose the following:

1.

we introduce a "imglib2-algorithms-sandbox" where one can put code which he wants to make visible/usable to others but didn't have time for a super-clean implementation yet, they can also be removed without being marked as deprecated. We think this package is important because many of these algorithms can be good starting point for more sophisticated implementations.

This is a very good idea. The contract (no deprecation policy) you propose if fine, and to insist I would suggest not to ship it e.g. with Fiji. In my view this is a bit better that a contrib branch, because you can have this package depending on all imglib2 packages.

1.

we should agree on one package (e.g. imglib2-ops) what should facilitate the interfaces definitions and usage (e.g. concatenation of algorithms, applying pixel-wise operation assignments on images or rois, etc.) of algorithms in general. addtionally this package may also contain very basic operation like add substract etc)

There I would break. Yes we clearly need a core super-package that contains main interfaces. But I feel it should be imglib2, not imglib2-ops. In my personal use case, some of the duplicate implementation you mention also exist in ops. The story is that I do not know what is the ops package. Is it meant for scripting? Specifically for KNIME? Also, where does it stand in the imglib2 hierarchy? Is it a package that depends on all other imglib2 packages, or an upstream package, on which all (but a few) other imglib2 packages depend? Right now, it looks like (I Might Be Wrong and I love Radiohead) like a package that bridges imglib2 to a specific syntax, like scripting. If this is the case, then algorithms should be implemented upstream in imglib2-algorithms-*, and imglib2-ops should wrap them.

1.

algorithm live in another package, e.g. imglib2-algorithms and have to comply with certain conventions that need to be elaborated and algorithms can only be added via pull-requests.

The conventions for stable algorithms in imglib2-algorithms can be something like this:

a) no other algorithm doing the same should exist (if yes, this algorithm should be extended instead of adding a similar one)

As per Stephan remark, maybe we could rephrase this rule by saying that there can be several implementations of the same interface, but not two interfaces that points to the same image-processing problem. This would probably first require to rewire some of the classes ditribution in packages (e.g., where do we put neighborhood iterators?)

b) each algorithm must inherit from the same interface such that it can be embedded e.g. in the operation framework and should follow a certain coding convention (e.g. the obligation to provide static accessor methods, like in Gauss3)

I once wrote a class that attracted Stephan eyes and I noticed he had a very clear idea about conventions (such as static accessors). Stephan, could you formulate them for all?

c) the implementation should comply with a set of fixed coding guidelines (e.g. avoid the use of Img itself instead use IterableInterval or RandomAccessibleIntervals etc, whenever possible, use Views to create Iterables over RandomAccessibles, Formatting conventions etc..)

d) Multi-threaded algorithms must provide a method which takes an ExecutorService, which is e.g. important for application which maintain their own ThreadPool.

Could this go in a specific interface? cheers jy

dietzc commented 11 years ago

On 07.11.2013 16:58, Jean-Yves Tinevez wrote:

Yes we clearly need a core super-package that contains main interfaces. But I feel it should be imglib2, not imglib2-ops. In my personal use case, some of the duplicate implementation you mention also exist in ops. The story is that I do not know what is the ops package. Is it meant for scripting? Specifically for KNIME? Also, where does it stand in the imglib2 hierarchy? Is it a package that depends on all other imglib2 packages, or an upstream package, on which all (but a few) other imglib2 packages depend? Right now, it looks like (I Might Be Wrong and I love Radiohead) like a package that bridges imglib2 to a specific syntax, like scripting. If this is the case, then algorithms should be implemented upstream in imglib2-algorithms-*, and imglib2-ops should wrap them.

You are right, the general interfaces don't need to be in ops, they could also live in algorithms or core. But we need to think about it carefully, as Ops is a very general definition (input -> output) which could be reused in algorithms, such that we don't need to wrap each and every algorithm with an op?

ImgLib2-ops is a mixture of the "Functions" implemented by barry and the Operation Framework which we had in KNIME (we implement any "algorithm" we have as an op). We agreed on some common interfaces in Dresden 2011. Then we pushed all our KNIME independend stuff we did in ImgLib2 to this repository, such that others can use it. the obvious mistake we made was that we pushed nearly anything to make it available and visible, but we didn't care too much about generality / code-quality. in the end nobody knows anymore what ops are and nobody uses ops, which is completely understandable. we should have had a sandbox or clear-definition of interfaces for algorithms before. developers should know where to look at.

However, imglib2-ops, especially the classes we (KNIP) contributed, will be cleaned-up and a lot will be moved into the algorithms-sandbox first. After we agreed on interfaces etc for algorithms, we really want adapt our stuff successively, but we also need a little bit of guidance here. The problem you had with imglib2-ops is really something we should try to avoid in the future: if we would have had one place where we (somehow) guarantee a certain level of code-qualitity + a sandbox, we could have saved some time I guess.

We are also not "real" software-developers and really learned a lot the during the last two years. This is an attempt to make it better! :-)

tpietzsch commented 11 years ago

Hi,

I like the sandbox idea. One question is, how do we decide when something is ready for moving to a more permanent place? One way would be via pull requests on github, and then a certain number of people has to vote it in (after reviewing the code of course).

I think we should also make a core sandbox and vote on not-quite-ready stuff that should go there (I vote for the net.imglib2.multithreading package).

With respect to imglib-ops, I think making them leight-weight wrappers around algorithms is a good idea I think. The biggest problem will be settling on the ops interfaces and infrastructure for hierarchically combining and applying them. As far as I can tell (but I'm not very familiar with the ops code) I think that this is still on the sandbox level.

Thanks a lot Christian and Martin for pushing in this direction! I like it very much.

best regards, Tobias

On Nov 7, 2013, at 5:31 PM, Christian Dietz notifications@github.com wrote:

On 07.11.2013 16:58, Jean-Yves Tinevez wrote:

Yes we clearly need a core super-package that contains main interfaces. But I feel it should be imglib2, not imglib2-ops. In my personal use case, some of the duplicate implementation you mention also exist in ops. The story is that I do not know what is the ops package. Is it meant for scripting? Specifically for KNIME? Also, where does it stand in the imglib2 hierarchy? Is it a package that depends on all other imglib2 packages, or an upstream package, on which all (but a few) other imglib2 packages depend? Right now, it looks like (I Might Be Wrong and I love Radiohead) like a package that bridges imglib2 to a specific syntax, like scripting. If this is the case, then algorithms should be implemented upstream in imglib2-algorithms-*, and imglib2-ops should wrap them.

You are right, the general interfaces don't need to be in ops, they could also live in algorithms or core. But we need to think about it carefully, as Ops is a very general definition (input -> output) which could be reused in algorithms, such that we don't need to wrap each and every algorithm with an op?

ImgLib2-ops is a mixture of the "Functions" implemented by barry and the Operation Framework which we had in KNIME (we implement any "algorithm" we have as an op). We agreed on some common interfaces in Dresden 2011. Then we pushed all our KNIME independend stuff we did in ImgLib2 to this repository, such that others can use it. the obvious mistake we made was that we pushed nearly anything to make it available and visible, but we didn't care too much about generality / code-quality. in the end nobody knows anymore what ops are and nobody uses ops, which is completely understandable. we should have had a sandbox or clear-definition of interfaces for algorithms before. developers should know where to look at.

However, imglib2-ops, especially the classes we (KNIP) contributed, will be cleaned-up and a lot will be moved into the algorithms-sandbox first. After we agreed on interfaces etc for algorithms, we really want adapt our stuff successively, but we also need a little bit of guidance here. The problem you had with imglib2-ops is really something we should try to avoid in the future: if we would have had one place where we (somehow) guarantee a certain level of code-qualitity + a sandbox, we could have saved some time I guess.

We are also not "real" software-developers and really learned a lot the during the last two years. This is an attempt to make it better! :-) — Reply to this email directly or view it on GitHub.

ctrueden commented 10 years ago

@dietzc and I are heavily hacking on work related to this, in the new SciJava OPS repository. Will update further by the end of this week!

ctrueden commented 10 years ago

how do we decide when something is ready for moving to a more permanent place? One way would be via pull requests on github, and then a certain number of people has to vote it in (after reviewing the code of course).

@tpietzsch: My personal preference is for each project to have a BDFL. I have been subjected to heavier process workflows on other projects which I feel harm development, and I would strongly prefer not to encumber ImgLib2 that way.

I nominate @tpietzsch for BDFL of ImgLib2 core, if @axtimwalde and @StephanPreibisch are OK with it. (Or if you guys really want to be a 3-way BDFL with voting, that's fine too. But let's not require PRs, please?) And you three are already pretty much BDFLs of the project, since if one you dislikes something it is basically vetoed.

And note that you can have a BDFL while still allowing direct contributions from others. It's how all the SciJava projects are at the moment.

I think we should also make a core sandbox

Yes, @dietzc wants such a sandbox, too. And that is fine with me. Let's make a separate Git repository imglib-sandbox?

Also, in general, what do you think about creating a new GitHub org "imglib" or "imglib2" (neither yet exists) and migrating all ImgLib2 stuff there? Or else migrating the imglib.git to scijava? I think we move imglib out of the imagej namespace before we come out of beta, because ImgLib2 really is about more than just ImageJ (and the net.imglib2 package and groupId already reflects this).

The biggest problem will be settling on the ops interfaces and infrastructure for hierarchically combining and applying them.

I am working on it. I doubt that scijava-ops will be ready for review by the end of the year though, unfortunately.

Thanks a lot Christian and Martin for pushing in this direction!

Yes, thank you so much for all your efforts, guys!

tinevez commented 10 years ago

@ctrueden The BDFL idea is great!

tpietzsch commented 10 years ago

Hi,

On Dec 13, 2013, at 12:02 AM, Curtis Rueden notifications@github.com wrote:

how do we decide when something is ready for moving to a more permanent place? One way would be via pull requests on github, and then a certain number of people has to vote it in (after reviewing the code of course).

@tpietzsch: My personal preference is for each project to have a BDFL. I have been subjected to heavier process workflows on other projects which I feel harm the development process, and I would strongly prefer not to encumber ImgLib2 that way.

I nominate @tpietzsch for BDFL of ImgLib2 core, if @axtimwalde and @StephanPreibisch are OK with it. (Or if you guys really want to be a 3-way BDFL with voting, that's fine too. But let's not require PRs, please?) And you three are already pretty much BDFLs of the project, since if one you dislikes something it is basically vetoed.

ok, then let's basically keep it as it is. I would be happy to be "main" BDFL of the core, if no-one objects. As I understand it, that does not really mean that I have to anything else than before, just maybe more of the same, right? I would just try to follow development closer and in general feel a little bit more responsible... :-) And note that you can have a BDFL while still allowing direct contributions from others. It's how all the SciJava projects are at the moment.

I think we should also make a core sandbox

Yes, @dietzc wants such a sandbox, too. And that is fine with me. Let's make a separate Git repository imglib-sandbox?

Also, in general, what do you think about creating a new GitHub org "imglib" or "imglib2" (neither yet exists) and migrating all ImgLib2 stuff there? Or else migrating the imglib.git to scijava? I think we move imglib out of the imagej namespace before we come out of beta, because ImgLib2 really is about more than just ImageJ (and the net.imglib2 package and groupId already reflects this).

Moving it out of the imagej namespace is a good idea. I personally don't mind, whether we move to scijava or create a new organization. @axtimwalde, @StephanPreibisch what do you prefer? If we decide to move to scijava, we be added to the scijava "owners" team?

best regards, Tobias

The biggest problem will be settling on the ops interfaces and infrastructure for hierarchically combining and applying them.

I am working on it. I doubt that scijava-ops will be ready for review by the end of the year though, unfortunately.

Thanks a lot Christian and Martin for pushing in this direction!

Yes, thank you so much for all your efforts, guys!

— Reply to this email directly or view it on GitHub.

StephanPreibisch commented 10 years ago

Hi,

Moving it out of the imagej namespace is a good idea. I personally don't mind, whether we move to scijava or create a new organization. @axtimwalde, @StephanPreibisch what do you prefer?

I would prefer to have our own project space ...

Otherwise I am fine with Tobias being the main BDFL (http://en.wikipedia.org/wiki/Benevolent_Dictator_for_Life), so basically continuing as it is now as Tobias pointed out. For important design decision Saalfeld, Tobias and me would still keep our old and time-proven 3-way BDFL voting scheme. Tobi, Saalfeld, are you ok with that??

Cheers, Stephan

On Dec 17, 2013, at 14:58 , tpietzsch wrote:

Hi,

On Dec 13, 2013, at 12:02 AM, Curtis Rueden notifications@github.com wrote:

how do we decide when something is ready for moving to a more permanent place? One way would be via pull requests on github, and then a certain number of people has to vote it in (after reviewing the code of course).

@tpietzsch: My personal preference is for each project to have a BDFL. I have been subjected to heavier process workflows on other projects which I feel harm the development process, and I would strongly prefer not to encumber ImgLib2 that way.

I nominate @tpietzsch for BDFL of ImgLib2 core, if @axtimwalde and @StephanPreibisch are OK with it. (Or if you guys really want to be a 3-way BDFL with voting, that's fine too. But let's not require PRs, please?) And you three are already pretty much BDFLs of the project, since if one you dislikes something it is basically vetoed.

ok, then let's basically keep it as it is. I would be happy to be "main" BDFL of the core, if no-one objects. As I understand it, that does not really mean that I have to anything else than before, just maybe more of the same, right? I would just try to follow development closer and in general feel a little bit more responsible... :-) And note that you can have a BDFL while still allowing direct contributions from others. It's how all the SciJava projects are at the moment.

I think we should also make a core sandbox

Yes, @dietzc wants such a sandbox, too. And that is fine with me. Let's make a separate Git repository imglib-sandbox?

Also, in general, what do you think about creating a new GitHub org "imglib" or "imglib2" (neither yet exists) and migrating all ImgLib2 stuff there? Or else migrating the imglib.git to scijava? I think we move imglib out of the imagej namespace before we come out of beta, because ImgLib2 really is about more than just ImageJ (and the net.imglib2 package and groupId already reflects this).

Moving it out of the imagej namespace is a good idea. I personally don't mind, whether we move to scijava or create a new organization. @axtimwalde, @StephanPreibisch what do you prefer? If we decide to move to scijava, we be added to the scijava "owners" team?

best regards, Tobias

The biggest problem will be settling on the ops interfaces and infrastructure for hierarchically combining and applying them.

I am working on it. I doubt that scijava-ops will be ready for review by the end of the year though, unfortunately.

Thanks a lot Christian and Martin for pushing in this direction!

Yes, thank you so much for all your efforts, guys!

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

ctrueden commented 10 years ago

I would prefer to have our own project space ...

Great, I moved it: https://github.com/imglib/imglib

The old URL (https://github.com/imagej/imglib) redirects automatically. And most importantly, the ImgLib2 organization has its own shiny icon.

I am fine with Tobias being the main BDFL ... For important design decision Saalfeld, Tobias and me would still keep our old and time-proven 3-way BDFL voting scheme.

I like this plan. However, please note that there are several "maintainer" aspects of being a BDFL which until now @dscho and I have been handling. For example, I set up the new GitHub organization just now, created the teams, etc. And we (the ImageJ2 team) have thus far cut the ImgLib2 releases. Unless @tpietzsch wants to start being responsible for such things, I am happy to continue doing it. All we ask is that the ImgLib2 BDFL(s) remain cognizant of it and appreciate how much work it is. :wink:

bogovicj commented 10 years ago

Resurrection of this issue!

By way of introduction for those I havn't met yet, my name is John Bogovic and I've been working with Stephan Saalfeld ( @axtimwalde ) at Janelia for the last few months.

First, I was happy to see this conversation here and like the direction / conclusions thus far. Has a sandbox been created?

I hope so, because it seems to be a good idea, and from my perspective it would help to:

know what algorithms are available. (like @dietzc, I'm hopeful both to use other people's methods and that what I write might be useful to others )
know when to be able to trust an algorithm. ( as @StephanPreibisch @tpietzsch and @ctrueden have discussed )

because as a relative newcomer, these two things have been a bit challenging so far.

There have been a couple of algorithms that I've 'stumbled upon' while looking through the code. While I usually can get them to produce some output, its not often clear to me (especially for the more exotic algs) that the ouput is correct or if I'm calling them in the approved way.

My suggestion at the moment would be to include some non-trivial test case in the sandbox that accompanies it once its 'approved.' I expect such cases when developing algorithms anyway, so why not make them visible?

If these all stay in the same place or are named similarly, it would be easier to see both what is available and what is approved (if you see one of these tests outside the sandbox, it should be safe to assume that its been run by someone in this community other than the alg author - one of the BDFLs or someone they've delegated).

What does everyone think? If / when a sandbox is made I'll absolutely start dumping my more experimental stuff in there for people to play with / criticize / etc.

Thanks again everyone! John

PS A final related question, I'd really love some quick convenience methods like the following:

SimpleOps.add( img1, img2, result );
result = SimpleOps.threshold( img1, thresh );

because honestly when these are used as intermediate steps in an alg, what I'd like more than a general solution is something that produces code that's quick to type and easy to read, just for some of the most common tasks. Does such a thing exist? It sounded like @dietzc and @tinevez may have been getting at this issue when discussing the ops package, but its not clear to me if a conclusion was drawn.

dietzc commented 10 years ago

Hi John,

just a quick reply: In March we (@ctrueden, @hornm, @dscho, ...) started to work on a package called ImageJ-Ops, which basically will contain all of the stuff which is currently available in imglib2-ops. In the future we plan to deprecate imglib2-ops.

Have a look here for more details on ImageJ-Ops:

http://developer.imagej.net/2014/04/04/announcing-imagej-ops

I hope this helps, Christian

bogovicj commented 10 years ago

Great, thanks Christian - I'll check it out!

dscho commented 10 years ago

@bogovicj imagej-ops also addresses a couple of problems with extensibility and the impossibility in imglib2-algorithms for implicitly optimized algorithms (if you want to run optimally performing algorithms on your data, you will have to make sure to run the most algorithm for your actual data explicitly).

As you undoubtedly know, static methods can neither be overridden nor augmented in third-party libraries. This is another problem we addressed with ops.

bogovicj commented 10 years ago

@dscho Sounds good! Then, is there an imagej-ops-sandbox ? As for seeing what's available - is checking the javadoc the way to go for now? Thanks again!

ctrueden commented 10 years ago

@bogovicj See the ImageJ OPS announcement blog post for an overview of the project. See also the ImageJ OPS readme and tutorials—Using OPS and Create a New OP—for documentation at the moment. You can also check out the javadoc, but the blog post and readme should hopefully convey the fundamentals of the design, and why it is better than a static SimpleOps class could ever be. And please feel free to follow up with questions!

ctrueden commented 8 years ago

@dietzc This issue has not been updated in a long time now.

Is there more to discuss here? Perhaps we can close in favor of scijava/scijava-common#133? Or split apart your comments into individual issues?