Closed ejhumphrey closed 9 years ago
Oh, also meant to mention, this is certainly an Isophonics bug:
If this is an isophonics bug, do we care? I mean, you could add a check in the API to ensure end>=start, but it's not mission critical
Yup, this is a known issue. This bug was introduced due to the automatic inclusion of the acutal "duration" of the file as its final boundary. That is one of the reasons why the guys at Tampere University reviewed all the boundaries and published a corrected version ( http://www.cs.tut.fi/sgn/arg/paulus/beatles_sections_TUT.zip).
What @bmcfee and I did was to simply ignore those boundaries that start before the previously annotated one.
Regardless, and as Justin says, we shouldn't really care much about it, and simply reflect what's actually on the Isophonics dataset, even if it's obviously wrong.
On Tue, Nov 25, 2014 at 10:46 PM, Eric J. Humphrey <notifications@github.com
wrote:
Oh, also meant to mention, this is certainly an Isophonics bug:
— Reply to this email directly or view it on GitHub https://github.com/marl/jams/issues/6#issuecomment-64511209.
On Wed, Nov 26, 2014 at 12:37 AM, justinsalamon notifications@github.com wrote:
If this is an isophonics bug, do we care? I mean, you could add a check in the API to ensure end>=start, but it's not mission critical
— Reply to this email directly or view it on GitHub https://github.com/marl/jams/issues/6#issuecomment-64517437.
Mission critical, no; however, the ISMIR release does implement a start < end check, so loading this particular file from the repository raised an error.
While I generally agree we shouldn't fix others' issues, (1) the current data isn't JAMS compliant and (2) now this knowledge exists in a public state. Regardless of the Isophonics / TUT versions, I'm curious what we should do to with this file that we provide, but can't parse. On 26 Nov 2014 00:40, "Oriol Nieto" notifications@github.com wrote:
Yup, this is a known issue. This bug was introduced due to the automatic inclusion of the acutal "duration" of the file as its final boundary. That is one of the reasons why the guys at Tampere University reviewed all the boundaries and published a corrected version ( http://www.cs.tut.fi/sgn/arg/paulus/beatles_sections_TUT.zip).
What @bmcfee and I did was to simply ignore those boundaries that start before the previously annotated one.
Regardless, and as Justin says, we shouldn't really care much about it, and simply reflect what's actually on the Isophonics dataset, even if it's obviously wrong.
On Tue, Nov 25, 2014 at 10:46 PM, Eric J. Humphrey < notifications@github.com
wrote:
Oh, also meant to mention, this is certainly an Isophonics bug:
— Reply to this email directly or view it on GitHub https://github.com/marl/jams/issues/6#issuecomment-64511209.
On Wed, Nov 26, 2014 at 12:37 AM, justinsalamon notifications@github.com
wrote:
If this is an isophonics bug, do we care? I mean, you could add a check in the API to ensure end>=start, but it's not mission critical
— Reply to this email directly or view it on GitHub https://github.com/marl/jams/issues/6#issuecomment-64517437.
— Reply to this email directly or view it on GitHub https://github.com/marl/jams/issues/6#issuecomment-64517637.
Also affects "Carry That Weight" - https://github.com/marl/jams/blob/master/datasets/Isophonics/The%20Beatles/11_-_Abbey_Road/15_-_Carry_That_Weight.jams#L1711
...
Also, a previously known issue with the Billboard data is apparently unresolved, and survived in 0947:
Ultimately, this is actually a bigger issue than once thought, because it's possible to create invalid jams, but then you can't load them to correct it; this validation should be performed on interval creation.
Just to round this one out, adding a schema requirement that time and duration are non-negative (see #5 comment thread) can help catch these things.
If we add an option strict
to the dump
and load
methods, we can trivially add validation for incoming and outgoing data.
Pandas branch has optional strict dumping, and schema validates intervals (durations), which resolves the initial issue.
Shall we close this one?
Closing.
TUT Beatles annotations link is broken, so for those searching here it is in @urinieto's repository https://github.com/urinieto/msaf-data/tree/master/BeatlesTUT/references
Caught a bug (I think?) in the Isophonics import, where a start time > end time:
https://github.com/marl/jams/blob/master/datasets/Isophonics/The%20Beatles/10CD1_-_The_Beatles/CD1_-_04_-_Ob-La-Di%2C_Ob-La-Da.jams#L4126
Not really sure how to fix / hack this. The duration of the file I've got is 188.839, which doesn't match the annotation, nor is it smaller than this final interval.