kahst / BirdNET-Lite

TFLite version of BirdNET. Bird sound recognition for more than 6,000 species worldwide.
Other
147 stars 194 forks source link

Request: German species list #4

Open krummrey opened 3 years ago

krummrey commented 3 years ago

I got BirdNET-Lite running and want to try to analyse a days worth of recording in my garden. The output generated is in latin and english. Do you have a german language labels.txt as in the phone app?

Also can you elaborate a little into the confidence values? Where would a "reliable" be ? at 0.5 or 0.9?

Thanks for the great work. I have so much fun "catching" new birds.

euxoa commented 3 years ago

I stumbled into this only two days ago. With some data processing skills, you can import German names into the model/labels.txt file, if you have a list with matching scientific vs German names available. I did that for Finnish names, in R:

library(dplyr)

labels <- readr::read_delim("labels.txt", delim="_", col_names=c("name", "ename"))
transl <- readr::read_delim("../Maailman-lintujen-suomenkieliset-nimet-20180731.txt",
                delim="\t", locale=readr::locale(encoding="latin1"))
labels %>%
       left_join(transl %>% mutate(name=`Tieteellinen nimi`, fname=`Nimi suomeksi`)) %>%
       select(name, fname, ename) %>%
       mutate(cname=ifelse(is.na(fname), ename, fname)) %>%
       mutate(line=paste(name, cname, sep="_")) %>%
       { paste(.$line, collapse="\n")} %>% {writeLines(., con="labels2.txt") }

Just keep the format as name1_name2 for each line, and the positions of lines in the file intact.

What comes to reliability, I don't think you get complete reliability with any cutoff. I have used the default 0.1 and filtered out lines with reliability above 0.7 or 0.9 from the result file. Then if some of these are interesting, I look at the spectrogram in Audacity, and listen to it, to confirm or reject.

But the result file with cutoff of 0.1 is useful as well! For birds tend to have continuity, and typically the same bird appears in successive or almost successive frames. If you get something unexpected on the high-reliability list, look at the context around that frame in the file with cutoff=0.1. Often you find the correct id from the frames nearby (and it is a common bird).

Having been with this only a couple of nights, my workflow is definitely not final yet.

That said, I have already found interesting stuff from last year's records. Using BirdNET seems to have potential to speed up browsing of WAV files, and to reveal things that otherwise go unnoticed. Just don't take the ids at face value. They need to be confirmed somehow.

I think the identifications would be better if there were some continuity of scores over successive frames. It's not build into the current model, but maybe one can bring some of it there by post-processing. By looking at the source of analyze.py it would be best to take the whole score vector over several (say 3–10) frames and smooth it somehow, maybe by convolution in logit space. Or a Markov model. I just don't currently have a ground truth against which to optimise the required smoothness parameter. ;) A crude way is to manually look for identical labels in many subsequent frames. Have to think about this, although probably I wouldn't have time to code anything..

patlevin commented 3 years ago

If you are still interested, I have compiled a list of localised labels for 29 languages with varying levels of completeness:

Language Missing labels Missing labels (%)
Afrikaans 5774 90.76%
Catalan 544 8.55%
Chinese 264 4.15%
Chinese (Traditional) 295 4.64%
Croatian 370 5.82%
Czech 683 10.74%
Danish 460 7.23%
Dutch 264 4.15%
Estonian 3171 49.84%
Finnish 518 8.14%
French 264 4.15%
German 264 4.15%
Hungarian 2688 42.25%
Icelandic 5588 87.83%
Indonesian 5550 87.24%
Italian 524 8.24%
Japanese 640 10.06%
Latvian 4821 75.78%
Lithuanian 597 9.38%
Northern Sami 5605 88.10%
Norwegian 325 5.11%
Polish 265 4.17%
Portuguese 2742 43.10%
Russian 808 12.70%
Slovak 264 4.15%
Slovenian 5532 86.95%
Spanish 348 5.47%
Swedish 264 4.15%
Thai 5580 87.71%
Ukrainian 646 10.15%

My localisation database is missing 264 entries that are found in labels.txt hence no language has 0% missing entries. I have attached the localised labels (formatted as label_<ISO 639-1>.txt).

If there's any interest in this, I can create a pull-request as well.

UPDATED labels_l18n.zip

EDIT: the previous files didn't match the original labels.txt which lead to problems. The above link contains the fixed files.

nilspupils commented 3 years ago

That is great! Thank you for sharing!

ghost commented 3 years ago

Is it possible to get a full label.txt list for German with all entries ?

I tried the incomplete list and run into an error: see here --> https://github.com/kahst/BirdNET-Lite/issues/11 , while getting different results for the latin names.

patlevin commented 3 years ago

@Christoph-Lauer

Is it possible to get a full label.txt list for German with all entries ?

I tried the incomplete list and run into an error: see here --> #11 , while getting different results for the latin names.

I will regenerate the labels so they match the original labels.txt. The names are definitely correct, the order, however, might not be. I'll fix that right away.

ghost commented 3 years ago

Can confirm that the names are correct, but the number of lines different in both files. Would be happy to get a labels.txt file with the same number of lines as the TF model (6362 lines).

patlevin commented 3 years ago

@Christoph-Lauer I updated the archive in my comment above with the fixed files.

Also: korrigierte Liste mit deutschen Namen

Let me know if that fixes things, I'll redo the name mapping otherwise.

ghost commented 3 years ago

YOU MAKE MY DAY ;-)

nilspupils commented 3 years ago

Good news! Thanks @Christoph-Lauer for checking and @patlevin for setting up the new list!

DD4WH commented 2 years ago

Thanks a lot for the German Names species list! I made a few minor corrections and added a few names which were still in English (eg. some of the warblers).

labels_de.txt

nilspupils commented 2 years ago

This is a german language list of only the european species. Thanks to @DD4WH for his lists which i compiled to this one. Please check for errors as this was a really messy Excel job..... labels_de_europe.txt