imagej / pyimagej

Use ImageJ from Python
https://pyimagej.readthedocs.io/
Other
467 stars 81 forks source link

Implement a CellProfiler module for running SciJava modules via pyimagej #89

Closed ctrueden closed 3 years ago

ctrueden commented 3 years ago

Recently, the COBA teams at Broad and LOCI discussed how best to proceed with ImageJ integration into CellProfiler. We decided to pursue two avenues on the CellProfiler side:

  1. Add ability to call Fiji headless via command line.
  2. Implement a CP module for discovering and executing SciJava modules via pyimagej.

This issue exists to track progress on (2), which I (@ctrueden) will be implementing in coming weeks. The CP team at Broad will do the initial investigation toward approach (1).

For the community's interest, the following are my notes from COBA's discussions in October 2020:

21 October 2020

CP uses Java for several things:

All this is present in prokaryote within a single JAR. The Java API is burdensome to use via javabridge. The architecture does not make extensibility easy.

ImageJ's primary strength is extensibility. A good CP+ImageJ integration will leverage ImageJ's extensibility, allowing users to plug in functionality in ways that are both easy and powerful.

Using scyjava instead of javabridge would make this possible.

7 October 2020

Making pyimagej easily usable from pip

See: https://github.com/imagej/pyimagej/issues/88 The two main issues currently are: a. pyjnius needs a special JAR file present to connect with Java; and b. Java and Maven are not (to my knowledge) available from PyPI as dependencies (since they aren't Python modules). They need to be installed separately.

Regarding 1a: Ed and Mark have made great headway migrating the pyimagej stack from pyjnius to jpype. With jpype, 1a will be moot. We are shooting to complete that work and release pyimagej 1.0.0 by the end of October. Follow the progress at https://github.com/scijava/scyjava/issues/18.

Regarding 1b: we discussed several different possible ways forward:

  1. Instruct users to install Fiji separately, and tell CellProfiler the path to their installation via a configuration dialog. Then use that Fiji installation's bundled JVM and bundled JAR files.

    • Pros: -- Would give CP access to the full power of Fiji, fully customizable/extensible by the user. -- Would simplify the CellProfiler build system and slim down its distribution (no more need to ship Java with CP).
    • Cons: -- Users would need to perform an extra step to gain access to Bio-Formats functionality. -- Would need to update CP's other usage of Java (some kind of metadata parsing, IIUC) to work differently: --- Could reuse Fiji's JVM for that too, but then every CP user would need to install Fiji :-( --- Could rewrite the metadata parsing module in Python, but unclear how feasible that would be.
  2. As 1, but keep shipping a JVM with CP and use that for CP's Java-related usages.

    • Pros: -- Would let CP's other existing usage of Java keep working out of the box, even if the user does not configure Fiji.
    • Cons: -- Would need to keep shipping a JVM with CP. -- CP's JVM (Java 14 right now) may not work as well with Fiji. Right now, we haven't tested ImageJ much with Java 14, and it's not an LTS release of Java.
  3. Ship a Fiji installation with CP.

    • Pros: -- All CP and CP+IJ features would work out of the box with no configuration.
    • Same cons as (2), plus: -- Would bloat the CP distro by hundreds of MB.
  4. As (3), but ship only the needed JAR files with CP.

    • Pros: -- All desired CP features would work out of the box with no configuration. -- Smaller footprint than (3)—on the order of dozens of MB of JAR files, depending which features we want to ship. -- For additional user-desired features, they could still let users point at a customized ImageJ/Fiji installation.
    • Same cons as (2), plus: -- CP's build system would need to invoke tooling to pull down the wanted JARs + dependencies into the distribution. Maven would be the obvious choice. IMO (without knowledge of the CP build system), this should be an easy thing to do.
  5. As (4), but with JAR files published to PyPI, so that CP can pull down the JARs as Python modules.

    • Pros: -- Would make it straightforward to add imagej to CP as a dependency, without modifying the CP build system.
    • Cons: -- Would require additional packaging effort on the ImageJ side, wrapping hundreds of JAR files into Python modules, and publishing them to PyPI with the same dependency tree. -- Or, we could publish ImageJ to PyPI as a single uber-JAR (would be ~150MB). But then the benefits of ImageJ2's modularity would be lost on the Python side. -- It seems redundant to deploy JARs to PyPI when Maven repositories already offer binary deployment of Java artifacts, and jgo enables direct consumption of these artifacts (see next option) in a more flexible way.
  6. Use pyimagej's support for remote endpoints (via jgo/Maven).

    • Pros: -- Transparently use ImageJ and other Java features from CP both on the desktop and on the cluster, without different case logic. -- Avoids upfront shipping of JAR files with CP, in favor of downloading needed Java libraries on demand. -- Decouples the Java-side component versions from the Python-side component versions, so users can e.g. easily switch between different versions of ImageJ and Fiji components within the same CP installation.
    • Cons: -- Would require either: A) shipping a copy of Maven with CP; or B) asking users to install Maven manually. -- Not as easy for users to customize their Fiji installations to gain access to additional functionality from CP. But you could combine this approach with the option for users to specify a locally installed Fiji as well, as discussed above, to get the best of both worlds. -- Substantial bootstrapping period the first time Java-side functionality is invoked, while jgo/Maven downloads and cache artifacts. Would be good to show the user a progress bar during initial bootstrapping.
  7. As (6), but enhance jgo to have a "Maven-free" (i.e. pure Python) mode of operation. For our purposes, this seems doable to me, although it would be some development effort.

    • Pros same as (6), plus: -- No need to ship Maven with CP, or ask users to install Maven, or bootstrap Maven ourselves.
    • Cons: -- Extra dev effort to reimplement needed dependency reasoning on the Python side.

Key requirements Allen mentioned for CP are:

I believe option (1) would sufficiently meet these requirements for the majority of users, except for one snag: CP has another usage of Java besides only Bio-Formats, which apparently is used(?) by all/most pipeline executions. Therefore, regardless of any ImageJ integration work, the CP project must decide: keep requiring Java for primary CP operation, or not?

2. What functionality do we want to enable with a CP+IJ bridge?

The three use cases I mentioned were:

  1. Replacing the current Bio-Formats bridge with a pyimagej-based one. The reasons for doing this are: A) reduced maintenance burden on the CP side; and B) access to ImageJ2's more powerful and extensible I/O mechanism.

  2. Giving Python scripts access to the full ImageJ API from CellProfiler. But looking now at the CP docs, I don't see a "Python script" module for CP? Does one not exist?

  3. Replacing the old RunImageJ module with a new one that lets you wrap any SciJava module, exposing the same inputs and outputs, as long as they can be mapped between the respective paradigms (i.e. images, numbers, strings, tables, ROIs yes—arbitrary Java objects no).

We agreed that (3) is desirable from a COBA/grant standpoint, and I think it's technically pretty straightforward to do, once we enable pyimagej access from CP in one of the ways discussed above. I'd be delighted to do a pair-programming session with Alice later this year, after pyimagej 1.0.0 is released, to hammer out a first version of such a module.

imagejan commented 3 years ago

Re 2.:

Giving Python scripts access to the full ImageJ API from CellProfiler. But looking now at the CP docs, I don't see a "Python script" module for CP? Does one not exist?

It seems a RunScript module was considered too insecure: https://github.com/CellProfiler/CellProfiler/pull/1770

ctrueden commented 3 years ago

It seems a RunScript module was considered too insecure

Yes. Part of this work is updating CP to use scyjava instead of javabridge, which would already be an important step forward toward consistency in our community. From there, an ImageJ CellProfiler module will be straightforward.

The CP team does not want to do the work needed to move away from javabridge, claiming the cost/benefit is not favorable. They say maintaining the current solution costs only a few hours of developer time per year, whereas updating the JVM mechanism would take at least a couple of weeks, and CellProfiler's lifespan won't be long enough to justify it. So I'm going to take a crack at updating CP to use scyjava, since I think it could maybe be done in 1-2 solid hacking sessions.

@imagejan As a user of both ImageJ and CellProfiler, what would help you the most? Is the time investment I'm proposing worth it? Would a looser integration that instead calls Fiji headless via system calls be good enough for any pipelines your group has historically used? Or do you think it's worth the effort to migrate CP over to scyjava? And if so, why?

imagejan commented 3 years ago

I'm afraid I cannot answer the question well, as I wouldn't call myself a regular user of CellProfiler. We do support it in our facility, but I rarely developed pipelines with it, mostly because I wanted/needed more flexibility (and likely didn't know enough about CellProfiler to use its full potential).

Having the possibility to use SciJava plugins/scripts from CellProfiler (via pyimagej or any other mechanism) might be a game changer, though.

ctrueden commented 3 years ago

Here's a Python program that iterates through available SciJava modules, filtering them to ones that are likely to be compatible with CellProfiler. It also demonstrates how to execute modules from Python:

import imagej
import numpy as np
from scyjava import jimport

ij = imagej.init()

Number = jimport('java.lang.Number')
Boolean = jimport('java.lang.Boolean')
Character = jimport('java.lang.Character')
CharSequence = jimport('java.lang.CharSequence')

RandomAccessibleInterval = jimport('net.imglib2.RandomAccessibleInterval')
ImgLabeling = jimport('net.imglib2.roi.labeling.ImgLabeling')
Table = jimport('org.scijava.table.Table')

PreprocessorPlugin = jimport('org.scijava.module.process.PreprocessorPlugin')
InputHarvester = jimport('org.scijava.widget.InputHarvester')

def is_type(item, java_type):
    """
    Can the item's type be converted into the requested one?
    """
    return ij.convert().supports(item.getType(), java_type.class_)

def is_basic(item):
    """
    Is the item convertible to a basic type?
    Basic types include:
    - numeric types
    - booleans
    - characters and strings
    """
    return is_type(item, Number) or is_type(item, Boolean) or \
           is_type(item, Character) or is_type(item, CharSequence)

def is_image(item):
    """
    Is the item convertible to an image?
    """
    return is_type(item, RandomAccessibleInterval)

def is_labeling(item):
    """
    Is the item convertible to an image labeling?
    """
    return is_type(item, ImgLabeling)

def is_table(item):
    """
    Is the item convertible to a table?
    """
    return is_type(item, Table)

def is_item_compatible(item):
    """
    Is the item compatible with CellProfiler?
    A compatible item's type is convertible to something CP knows how to deal with.
    """
    # TODO: Add support for container types: list, dict, and set.
    # It's tricky, because contained elements must also be compatible.
    return is_basic(item) or is_image(item) or is_labeling(item) or is_table(item)

def items(info):
    """
    Gets all module items -- both inputs and outputs -- without redundancy.
    """
    items = [item for item in info.inputs()]
    inputs = set(items)
    items.extend(item for item in info.outputs() if not item in inputs)
    return items

blocklisted_module_prefixes = [
    'command:net.imagej.ops.',
    'command:net.imagej.plugins.tools.',
    'legacy:',
]

def is_module_compatible(info):
    """
    Is this SciJava module compatible with CellProfiler?
    A compatible module:
    - Can be used in headless mode.
    - Has unresolved inputs and outputs only of compatible types.
    """
    # NB: Disable headless filter for now; it's too conservative.
    #if not info.canRunHeadless():
    #    return False

    # Filter out blocklisted modules.
    module_id = str(info.getIdentifier())
    if any(module_id.startswith(prefix) for prefix in blocklisted_module_prefixes):
        return False

    # Filter out module inputs and outputs resolved during preprocessing.
    # This includes, but is not limited to, service parameters, which are
    # resolved by the SciJava framework.
    m = info.createModule()
    for plugin_info in ij.plugin().getPluginsOfType(PreprocessorPlugin.class_):
        try:
            if InputHarvester.class_.isAssignableFrom(plugin_info.loadClass()):
                break
            preprocessor = ij.plugin().createInstance(plugin_info)
            preprocessor.process(m)
        except:
            # TODO: Log the error and continue.
            pass

    return all(m.isInputResolved(item.getName()) or is_item_compatible(item) for item in items(info))

def format_value(name, value):
    return '' if value is None or value == '' else f' {name}={value}'

def stringify(item):
    """
    Constructs a simple string representation of a module item.
    For debugging.
    """
    try:
        io_type = item.getIOType()
        item_type = item.getType()
        name = item.getName()
        label = format_value('label', item.getLabel())
        min_value = format_value('min', item.getMinimumValue())
        max_value = format_value('max', item.getMaximumValue())
        step_size = format_value('stepSize', item.getStepSize())
        soft_min = format_value('softMin', item.getSoftMinimum())
        soft_max = format_value('softMax', item.getSoftMaximum())
        default_value = format_value('defaultValue', item.getDefaultValue())
        choices = format_value('choices', item.getChoices())
        required = format_value('required', item.isRequired())
        return f'[{io_type}] {item_type.getSimpleName()} {name} -{label}{min_value}{max_value}{step_size}{soft_min}{soft_max}{default_value}{choices}{required}'
    except:
        return '<error>'

def execute(info, args={}):
    print(f'Executing {info.getIdentifier()}')
    # Execute the module, blocking till complete.
    outputs = ij.module().run(info, True, ij.py.to_java(args)).get().getOutputs()
    for key, value in outputs.items():
        print(f'==> {key} = {value}')

# -- Main --

modules = [info for info in ij.module().getModules() if is_module_compatible(info)]

for info in modules:
    print()
    print(f'[{info.getIdentifier()}]')
    for item in items(info):
        print(f'--> {stringify(item)}')

# Run some modules.

print()
print('=======================')
sys_info = ij.module().getModuleById('command:org.scijava.plugins.commands.debug.SystemInformation')
execute(sys_info)

from skimage import data
StringReader = jimport('java.io.StringReader')
ScriptInfo = jimport('org.scijava.script.ScriptInfo')

print()
print('=======================')
asciify = ScriptInfo(ij.context(), 'asciify.groovy', StringReader("""
#@ OpService ops
#@ Dataset image
#@output String result
result = ops.image().ascii(image)
"""))
coins = data.coins()[31:76,16:75]
execute(asciify, {
    'image': ij.py.to_dataset(coins)
})

import numpy as np

print()
print('=======================')
add_noise = ScriptInfo(ij.context(), 'add_noise.groovy', StringReader("""
#@ Dataset image
#@ float magnitude
#@both Dataset result
src = image.localizingCursor()
dst = result.randomAccess()
while (src.hasNext()) {
    v = src.next().getRealDouble() + 2 * magnitude * (Math.random() - 0.5)
    dst.setPosition(src)
    dst.get().setReal(v)
}
"""))

result = np.zeros(coins.shape, dtype=np.float32)
execute(add_noise, {
    'image': ij.py.to_dataset(coins),
    'magnitude': 0.5,
    'result': ij.py.to_dataset(result)
})
execute(asciify, {
    'image': ij.py.to_dataset(result)
})

ij.dispose()
ctrueden commented 3 years ago

@hinerm @alicelucas Shall we update (and maybe/probably close?) this issue with the latest info about RunImageJ etc.?

hinerm commented 3 years ago

Closed in https://github.com/CellProfiler/CellProfiler-plugins/commit/0621ebcac24271720680bb390c8da568bb94bd56