Next generation jams - Githubissues

This issue is intended to consolidate many of the long-standing issues and offline discussions we've had around revising the jams specification for a variety of applications and use-cases.

Goals of revising the schema

Migrate to a fully json-schema compliant spec #178 (instead of hybrid / dynamic namespaces).
- This would facilitate a number of applications, like #19 (quick look / web view and edit), #40 (collection management), and #86 (fragmented storage).
Add versioning to the schema definitions. This way, old files can still validate according to their specified jams version. This in turn makes it easier to evolve the schema without breaking compatibility.
Simplify (and accelerate) the validation code from the python side.

Revision phase 1: full json-schema definition

The first step is to move all namespace definitions into full jsonschema definitions. In the proposed change, a namespace definition now becomes a secondary schema for the Annotation object.

Annotation objects must validate against both the template schema (our current annotation schema def), and exactly one of the pre-defined namespace schemas. Each namespace schema defines an exact match on the Annotation.namespace field, in addition to whatever constraints placed on the value and confidence fields.

The is_sparse flag will be removed, as this is not part of jsonschema. (We'll come back to this later).

This phase will complete #178 .

Revision phase 2: hosted and versioned schema

Completing phase 1 will result in a fully json-schema compatible implementation of our specification, against which all current JAMS files should validate.

The next step (phase 2) is to place this schema under version control and host it remotely (e.g. `jams.github.io/schema/v0.3/schema.json`` or something). We can then revise the schema to include a version number in its definition, so that jams files can self-identify which version they are valid under.

With the remote schema implementation, it should be possible/easy to promote all jams definitions to top-level objects, so that you can independently validate an Annotation or FileMetadata object without having it belong to a full JAMS file.

This phase will complete #86 and facilitate #40 , by allowing partial storage.

Revision phase 3: extending the Annotation class

As mentioned in #24 , the current annotation structure might be a bit too rigid for more general media objects. @justinsalamon and I discussed this offline, and arrived at the following proposal:

Rename Annotation def to IntervalAnnotation, in which observations are (time, duration, value, confidence) tuples
Add new annotation types
- StaticAnnotation: just (value, confidence)
- BoundingBoxAnnotation: (x, y, width, height, value, confidence)
- TimeBoundingBoxAnnotation: (time, x, y, duration, width, height, value, confidence)
- possibly others: polygons, instantaneous samples, etc...
Annotation validation now becomes and(oneOf([Interval, Static, BoundingBox, ...]), oneOf([namespaces]))

This provides maximal flexibility in combining different annotation contents (tags etc) with annotation extents (time intervals, bounding boxes, etc). Including a StaticAnnotation type also provides a way to resolve #206.

Phase 3 completes the proposed changes to the schema.

Alongside schema changes, we also want to generalize some things about the python implementation. Notably, it would be good to extend the search function to also support annotation contents. This way, we could find and excerpt annotations by value (eg time intervals labeled as guitar or bounding boxes with face). This isn't a huge change from what the search function already does, but it will take a bit more implementation work.

Had a chat with @rabitt about some of this at ISMIR, and she pointed out that we currently have a bit of a blind spot when it comes to annotations of symbolic data. Concretely, objects like a score or a midi file may not have a fixed "duration" (in seconds), but may have similar extent specifications in terms of beats or ticks.

This seems soluble in the proposed framework by introducing extent types for symbolic data. We may need to wiggle a bit on the top-level schema (JAMS object) to make this work, but I think it would be worth doing in the long run.

Nice, I like this idea. ISMIR always sparking great conversations! <3

Related: following our music source separation tutorial where we use Scaper (which relies on JAMS) to generate training data, people were asking if it would be possible to beat-align stems (e.g. from different songs). One way to achieve this would be to support time in e.g. beats rather than seconds.

@bmcfee by extent types are you referring to how time is represented more generally? I.e. currently we support time/duration in seconds. Would the idea be to support time/duration in units other than seconds?

One way to achieve this would be to support time in e.g. beats rather than seconds.

I'm not sure how that would help / work? You'd still need some mapping of beats to time in that case, right?

Would the idea be to support time/duration in units other than seconds?

Yep, but as a separate extent type. Either beats or ticks, possibly both depending on how much need there is for it.

I've been taking a crack at this over the break. I'm most of the way there, though I've realized to make this work we may have to alter the JAMs schema a little, resulting in some currently existing valid jams data becoming invalid in the latest version..

Currently we have a list of annotations which can each contain a list of observations, and that list of observations can either be a sparse type or a dense type of list. I'm proposing that we change this to always be a list of observations (i.e., no dense / not dense distinction) and that observation type therein can either be a single observation (in the sparse case), or a observation containing lists of values (in the dense case). This will move all current jams dense observations down one level to the observation type, rather than being a different Annotation type overall.

This way the Annotation type itself has all the non-data dependent properties (e.g., Curator, sandbox, etc..) and it is only its data attribute that is defined by the observation type (both the. data and namespace attributes will be defined by the namespace). This data attribute is always an array of observations, and in the case of current DenseObservation types that exist out there in the wild, it will be a single element array with the observation type itself containing value, confidence, time, and duration arrays.

This greatly simplifies the code and schema, but will change the schema for dense observations from something like:

{
    "annotations": [
        {
            "data": {
                "value": [ 1.0, 0.5 ],
                "time": [ 1.0, 2.0 ],
                "confidence": [ 0.9, 0.9 ],
                "duration": [ 1.0, 1.0 ]
            }
        }
    ]
}

to something like:

{
    "annotations": [
        {
            "data": [
                {
                    "values": [ 1.0, 0.5 ],
                    "times": [ 1.0, 2.0 ],
                    "confidences": [ 0.9, 0.9 ],
                    "durations": [ 1.0, 1.0 ]
                }
            ]
        }
    ]
}

This has the added benefit of one annotation having possibly multiple dense observations in an Annotation. E.g., in the case of pitch contours, multiple pitch contours beginning and ending according to a vocal activity detector, or in an annotation application where the annotator is able to draw contours over a waveform, each drawn contour could be sampled and represented as a single DenseObservation.

At phase 3 of this issue, we can then further include a dense sampled observation type, e.g.:

{
    "annotations": [
        {
            "data": [
                {
                    "values": [ 1.0, 0.5, 0.3 ],
                    "start_time": 1.0,
                    "sample_rate": 1000.0,
                }
            ]
        }
    ]
}

I've been taking a crack at this over the break. I'm most of the way there, though I've realized to make this work we may have to alter the JAMs schema a little, resulting in some currently existing valid jams data becoming invalid in the latest version..

I'm okay with that in the long run -- schemas should change and improve! But I think any changes we make should come after we translate the existing schema into something that can be properly validated in jsonschema and version-stamped. This will make forward-migration much easier, and cut down on friction.

I'm proposing that we change this to always be a list of observations (i.e., no dense / not dense distinction) and that observation type therein can either be a single observation (in the sparse case), or a observation containing lists of values (in the dense case).

This is an interesting suggestion, and I'm trying to noodle out all the downstream consequences. For background, the sparse/dense idea is really just a storage optimization hack: from the object model (python implementation) perspective, all we have are sparse observations, and this uniformity is extremely helpful in a lot of cases (eg abstract data augmentation).

If I understand the proposal correctly, it basically amounts to making everything dense, and annotations that we currently treat as sparse are just special cases where the length of the array is 1. Do I have that right?

This has the added benefit of one annotation having possibly multiple dense observations in an Annotation. E.g., in the case of pitch contours, multiple pitch contours beginning and ending according to a vocal activity detector, or in an annotation application where the annotator is able to draw contours over a waveform, each drawn contour could be sampled and represented as a single DenseObservation.

That's pretty nice! Right now, we hack around it by forcing contour ids into the observations, which requires some post-filtering to extract out.

At phase 3 of this issue, we can then further include a dense sampled observation type, e.g.:

Hm. Are you thinking of this as something like a mixin type, eg (SampledAnnotation + Pitch_Hz)? Or something that would be coded into the namespace directly?

In general, I'm less keen on adding variable fields this deep into the schema because it will break uniformity of representation across namespaces. This might be necessary at times, I'm not sure. But if possible, I'd like to keep things uniform because it significantly simplifies downstream abstract code, eg jams.display and jams.eval.

marl / jams

Next generation jams #208

Goals of revising the schema

Revision phase 1: full json-schema definition

Revision phase 2: hosted and versioned schema

Revision phase 3: extending the Annotation class

Phase 3 completes the proposed changes to the schema.