marl / jams

A JSON Annotated Music Specification for Reproducible MIR Research
ISC License
183 stars 26 forks source link

Python API doesn't validate intervals when created / written. #6

Closed ejhumphrey closed 9 years ago

ejhumphrey commented 9 years ago

Caught a bug (I think?) in the Isophonics import, where a start time > end time:

          {
          "start": {
            "value": 188.854
          }, 
          "end": {
            "value": 188.827
          }, 
          "label": {
            "value": "silence"
          }

https://github.com/marl/jams/blob/master/datasets/Isophonics/The%20Beatles/10CD1_-_The_Beatles/CD1_-_04_-_Ob-La-Di%2C_Ob-La-Da.jams#L4126

Not really sure how to fix / hack this. The duration of the file I've got is 188.839, which doesn't match the annotation, nor is it smaller than this final interval.

ejhumphrey commented 9 years ago

Oh, also meant to mention, this is certainly an Isophonics bug:

http://isophonics.net/files/annotations/seglab/The%20Beatles/10CD1_-_The_Beatles/CD1_-_04_-_Ob-La-Di,_Ob-La-Da.lab

justinsalamon commented 9 years ago

If this is an isophonics bug, do we care? I mean, you could add a check in the API to ensure end>=start, but it's not mission critical

urinieto commented 9 years ago

Yup, this is a known issue. This bug was introduced due to the automatic inclusion of the acutal "duration" of the file as its final boundary. That is one of the reasons why the guys at Tampere University reviewed all the boundaries and published a corrected version ( http://www.cs.tut.fi/sgn/arg/paulus/beatles_sections_TUT.zip).

What @bmcfee and I did was to simply ignore those boundaries that start before the previously annotated one.

Regardless, and as Justin says, we shouldn't really care much about it, and simply reflect what's actually on the Isophonics dataset, even if it's obviously wrong.

On Tue, Nov 25, 2014 at 10:46 PM, Eric J. Humphrey <notifications@github.com

wrote:

Oh, also meant to mention, this is certainly an Isophonics bug:

http://isophonics.net/files/annotations/seglab/The%20Beatles/10CD1_-_The_Beatles/CD1_-_04_-_Ob-La-Di,_Ob-La-Da.lab

— Reply to this email directly or view it on GitHub https://github.com/marl/jams/issues/6#issuecomment-64511209.

On Wed, Nov 26, 2014 at 12:37 AM, justinsalamon notifications@github.com wrote:

If this is an isophonics bug, do we care? I mean, you could add a check in the API to ensure end>=start, but it's not mission critical

— Reply to this email directly or view it on GitHub https://github.com/marl/jams/issues/6#issuecomment-64517437.

ejhumphrey commented 9 years ago

Mission critical, no; however, the ISMIR release does implement a start < end check, so loading this particular file from the repository raised an error.

While I generally agree we shouldn't fix others' issues, (1) the current data isn't JAMS compliant and (2) now this knowledge exists in a public state. Regardless of the Isophonics / TUT versions, I'm curious what we should do to with this file that we provide, but can't parse. On 26 Nov 2014 00:40, "Oriol Nieto" notifications@github.com wrote:

Yup, this is a known issue. This bug was introduced due to the automatic inclusion of the acutal "duration" of the file as its final boundary. That is one of the reasons why the guys at Tampere University reviewed all the boundaries and published a corrected version ( http://www.cs.tut.fi/sgn/arg/paulus/beatles_sections_TUT.zip).

What @bmcfee and I did was to simply ignore those boundaries that start before the previously annotated one.

Regardless, and as Justin says, we shouldn't really care much about it, and simply reflect what's actually on the Isophonics dataset, even if it's obviously wrong.

On Tue, Nov 25, 2014 at 10:46 PM, Eric J. Humphrey < notifications@github.com

wrote:

Oh, also meant to mention, this is certainly an Isophonics bug:

http://isophonics.net/files/annotations/seglab/The%20Beatles/10CD1_-_The_Beatles/CD1_-_04_-_Ob-La-Di,_Ob-La-Da.lab

— Reply to this email directly or view it on GitHub https://github.com/marl/jams/issues/6#issuecomment-64511209.

On Wed, Nov 26, 2014 at 12:37 AM, justinsalamon notifications@github.com

wrote:

If this is an isophonics bug, do we care? I mean, you could add a check in the API to ensure end>=start, but it's not mission critical

— Reply to this email directly or view it on GitHub https://github.com/marl/jams/issues/6#issuecomment-64517437.

— Reply to this email directly or view it on GitHub https://github.com/marl/jams/issues/6#issuecomment-64517637.

ejhumphrey commented 9 years ago

Also affects "Carry That Weight" - https://github.com/marl/jams/blob/master/datasets/Isophonics/The%20Beatles/11_-_Abbey_Road/15_-_Carry_That_Weight.jams#L1711

ejhumphrey commented 9 years ago

...

Also, a previously known issue with the Billboard data is apparently unresolved, and survived in 0947:

Ultimately, this is actually a bigger issue than once thought, because it's possible to create invalid jams, but then you can't load them to correct it; this validation should be performed on interval creation.

bmcfee commented 9 years ago

Just to round this one out, adding a schema requirement that time and duration are non-negative (see #5 comment thread) can help catch these things.

If we add an option strict to the dump and load methods, we can trivially add validation for incoming and outgoing data.

bmcfee commented 9 years ago

Pandas branch has optional strict dumping, and schema validates intervals (durations), which resolves the initial issue.

Shall we close this one?

rabitt commented 9 years ago

Closing.

CyxouD commented 2 years ago

TUT Beatles annotations link is broken, so for those searching here it is in @urinieto's repository https://github.com/urinieto/msaf-data/tree/master/BeatlesTUT/references