marl / jams-data

Datasets and parsing scripts for JAMS
ISC License
26 stars 3 forks source link

Rewrite SALAMI parser to use raw data #9

Open bmcfee opened 8 years ago

bmcfee commented 8 years ago

Forking from https://github.com/craffel/mir_eval/issues/162 ; parsing the parsed salami annotations could lead to errors. We should instead work on the raw version of the annotations.

I at one point had done this, but for the life of me can't find my implementation. As I recall, it was pretty nasty and should be rewritten anyway.

Basically, what one has to do is the following:

  1. Separate instrument labels (which have parentheses) from segment labels
  2. Induce segment intervals from the event boundary markers
  3. Partition segments by vocabulary for conversion.
  4. If we're daring, also transfer the instrument annotations by matching parentheses.

1 and 2 should be easy. 3 I think can be easily achieved by a clever use of the JAMS namespace structure for each annotation, and a cunning use of pandas.

4 is tricky since you sometimes see open- and close-parens on the same event, and we'll need a namespace for the instruments.

urinieto commented 8 years ago

Will work on this over the weekend. On Wed, Feb 3, 2016 at 5:27 AM Brian McFee notifications@github.com wrote:

Assigned #9 https://github.com/marl/jams-data/issues/9 to @urinieto https://github.com/urinieto.

— Reply to this email directly or view it on GitHub https://github.com/marl/jams-data/issues/9#event-537532666.

urinieto commented 8 years ago

I finished points 1, 2 and 3. I don't have the cycles for tackling 4 right now (maybe we can do some hacking in Dagstuhl @bmcfee ?).

Let me know if you want me to do a PR of these changes or you'd prefer waiting for 4.

bmcfee commented 8 years ago

Sure, start the PR as a WIP and we can finish up point 4 later.

urinieto commented 8 years ago

Alright!