vocalpy / crowsetta

A tool to work with any format for annotating animal sounds
https://crowsetta.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
50 stars 3 forks source link

ENH: Formalize / expose functions for how format classes convert `annot_path` to `annotated_path` #205

Open NickleDave opened 2 years ago

NickleDave commented 2 years ago

Is your feature request related to a problem? Please describe. Currently each format may or may not include custom logic for determining programmatically the name of a file they annotate.

For example, crowsetta.formats.seq.notmat.NotMat does: https://github.com/vocalpy/crowsetta/blob/28fd13613c3d08d0592ca522b86b87e669efd3b8/src/crowsetta/formats/seq/notmat.py#L82 inside its from_file method

        audio_path = annot_path.parent / annot_path.name.replace('.not.mat', '')

while the crowsetta.formats.seq.simple.SimpleSeq.from_file method has a notated_path parameter that defaults to None, and that it does nothing with, although the underlying attrs class applies a converter to make it a pathlib.Path if its not None.

Describe the solution you'd like It would be nice if format classes could declare / expose functionality for converting annot_path -> annotated_path

One reason to do this would be to just make it an explicit part of the "API", so to speak, instead of having it hidden inside some of the from_file functions.

Another reason to do this would be to make it easier (possible) for other libraries to leverage this functionality.
E.g., vak has a map_annotated_to_annot function that could just use each format's classes function to do the mapping, instead of the current spaghetti-code logic.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

I can think of a couple ways to achieve this. May not be mutually exclusive.

  1. have a property / method that does this; e.g. if annotated_path were a property that encapsulated the functionality for converting from annot_path to annotated_path.

  2. allow the annotated_path argument to the from_format method to be a Callable, in addition to being a path itself This would let a user override the default behavior by passing in the callable. Downstream libraries could also leverage this functionality; e.g. vak could let a user specify in the config the name of a function to_annotated_path and then it would pass this in, in place of the default, just as a user might, when mapping annotations to the paths of the files they annotate

Where this might get complicated is when a single annotation file contains annotations for multiple annotated files. In that case, it does not make sense to determine the annotated_path from the annot_path; the annotations themselves must contain the path to each file they annotate, so it is clear which annotation corresponds to which file. I guess the way to handle this is to just not have any annotated_path function parameters or class properties for these format classes. Downstream libraries (e.g. vak example above) will need to check for the annotated_path attribute and decide what to do if they don't find it. Might make sense in that case to alternatively have an annotated_paths attribute that returns all the paths?

Additional context Related issues and discussion here on vak: https://github.com/vocalpy/vak/issues/563#issue-1341736666