vocalpy / crowsetta

A tool to work with any format for annotating vocalizations
https://crowsetta.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
49 stars 3 forks source link

Remove `Segment.from_row` method, no longer used #231

Closed NickleDave closed 1 year ago

NickleDave commented 1 year ago

and the leftover logic for it is confusing.

This method was used by the now-removed 'csv' format, see for example here: https://github.com/vocalpy/crowsetta/blob/a47798ac54440cd3a974a8dccb83154ae4ea7d1f/src/crowsetta/csv.py#L224

But now we no longer make individual Segments in the 'generic' format (that replaced 'csv'). Instead we make entire Sequences using the from_keyword instead, parsing csv files with pandas to avoid the maintenance work of doing it ourselves: https://github.com/vocalpy/crowsetta/blob/f95b08ae7556b336d0e99044df0947a0060163aa/src/crowsetta/formats/seq/generic.py#L265 (We grab a sequence at a time from the dataframe, and then get arrays from each column 'onset_s', 'offset_s', etc., via the values attribute.)

I realized Segment.from_row is no longer used as I was trying to write examples for the Segment docstring as suggested here in the pyOpenSci review: https://github.com/pyOpenSci/software-submission/issues/68#issuecomment-1451660136

We should just remove it and do the more intuitive thing with attrs:

Dumping a more detailed explanation of issues cause by from_row, from my dev notes:

  • I can't just make an instance by passing in args
  • It throws an error if I only pass in onset_s / offset_s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Segment.__init__() missing 2 required positional arguments: 'onset_sample' and 'offset_sample'
  • which is first of all confusing because it looks like these are optional, and I believe that was my intent? From the Segment.__init__ method
    onset_s = attr.ib(converter=attr.converters.optional(float_or_None))
    offset_s = attr.ib(converter=attr.converters.optional(float_or_None))
    onset_sample = attr.ib(converter=attr.converters.optional(int_or_None))
    offset_sample = attr.ib(converter=attr.converters.optional(int_or_None))
  • But then these converters are doing something extra weird, looking for a string "None":
def float_or_None(val):
    """converter that casts val to float, or returns None if val is string 'None'"""
    if val == "None":
        return None
    else:
        return float(val)
  • What's actually going on is that these are doing the work of a Segment.from_row method that can handle a row of strings from a csv. See this unit test for an example:
def test_Segment_init_onset_offset_in_seconds_from_row():
    header = ["label", "onset_s", "offset_s", "onset_sample", "offset_sample"]
    row = ["a", "0.123", "0.170", "None", "None"]
    a_segment = Segment.from_row(row=row, header=header)
    for attr in ["label", "onset_s", "offset_s"]:
        assert hasattr(a_segment, attr)
    for attr in ["onset_sample", "offset_sample"]:
        assert getattr(a_segment, attr) is None