ome / design

OME Design proposals
http://ome.github.io/design/
1 stars 15 forks source link

Bio-Formats: serialization policy #55

Closed sbesson closed 7 years ago

sbesson commented 8 years ago

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Background

Bio-Formats introduces a serialization functionality implemented as a ReaderWrapper called Memoizer using the kryo Java library during the development of Bio-Formats 5.0.0. See https://github.com/openmicroscopy/bioformats/pull/465

The serialization functionality is exposed to the Bio-Formats community via a public API and a series of examples. From the OME perspective, OMERO since version 5.0.0 is a major consumer of this API with the readers being serialized in the data repository at import time.

Current limitations

The current serialization implementation has some fragility which causes many additions (e.g. a non transient field) to break the serialization for all cached files. The OMERO.server upgrade documentation mentions this limitation and makes no assumption that serialization should be preserved between successive versions of the server. For large repositories and filesets where the regeneration of memo file is consuming both in terms of time and resources, systematic invalidation is far from ideal.

Versioning policy

Bio-Formats does not follow strict semantic versioning. However the current policy is that the minor version must be bumped when the API is broken as it was the case for Bio-Formats 5.2.0.

Serialization and similarly ABI is not covered by semantic versioning and there are two possible alternatives until we have a proper serialization strategy in the meantime:

Solution 1

This is the strictest solution and should likely be backed by implementing some testing infrastructure for detecting serialization regression.

Solution 2

In this case, the decision should be taken depending on the context and application. In particular, the consumption of a given series of Bio-Formats by deployed resources like OMERO, IDR might affect the decision.

Related reading

See discussion on https://github.com/openmicroscopy/bioformats/pull/2528, https://github.com/openmicroscopy/ome-documentation/pull/1183

mtbc commented 8 years ago

I'd hesitate to lump fragile memoizer stuff in with general ABI: for instance, even if the memo is invalidated I'd still expect to be able to update the Bio-Formats classes in my larger application JAR (from x.y.p to x.y.q) without recompiling against them without breaking anything.

I'd expect that we'd want to change private fields of objects rather more frequently than public API so it makes sense to me that those would be differently inhibited by different versioning policies, whatever those might be.

sbesson commented 8 years ago

Summary of the discussion on this front with @joshmoore, @mtbc, @dgault, @hflynn, @simleo and @bramalingam.

Current status/Immediate policies:

Regarding future steps:

mtbc commented 8 years ago

Might also want a mechanism to note non-urgent commit backlog by reader (or file?) so that they can wait until someone else more strongly needs to break serialization.

sbesson commented 7 years ago

Closing as per https://github.com/openmicroscopy/bioformats/pull/2586