kahst / BirdNET-Analyzer

BirdNET analyzer for scientific audio data processing.
Other
815 stars 149 forks source link

Segments.py can't match audio and result files #433

Open kahst opened 2 weeks ago

kahst commented 2 weeks ago

In segments.py audio files and result files can't be matched if the audio dir and result dir are different (which we explicitly allow to do) and we don't use combined result files.

In line 92 we say the dictionary entry is:

data[os.path.join(root, f.rsplit(".", 1)[0])] = {"audio": os.path.join(root, f), "result": ""}

which contains the full audio file dir tree.

In line 98 the table_key is:

table_key = os.path.join(root, f.split(".BirdNET.", 1)[0])

which then contains the result file dir tree and thus cannot be matched with the keys in the data dict.

We could remove the full dir trees from the key and only use the filename without the extension as keys in data. However, we might have issues when people have duplicate filenames (which they shouldn't but often do).

What do you think?

kahst commented 2 weeks ago

Also, I think we should use the common name instead of the species code for segments folder names:

species = d[header_mapping["Species Code"]]

should be

species = d[header_mapping["Common Name"]]

when rtype == "table"

(we're very inconsistent with that, we use species code, common name and scientific name depending on the result type)

max-mauermann commented 1 week ago

i think we could do something like table_key = os.path.join(root.strip(apath), f.split(".BirdNET.", 1)[0]) or .strip(rpath) respectively.

this should not cause issues with duplicate filenames in different folders, but would require the audio and result folders to be structured identically (although this is what the analysis outputs anyway)

For the segment folder names we could also add this as an option to choose from.