Request: German species list

krummrey commented 3 years ago

I got BirdNET-Lite running and want to try to analyse a days worth of recording in my garden. The output generated is in latin and english. Do you have a german language labels.txt as in the phone app?

Also can you elaborate a little into the confidence values? Where would a "reliable" be ? at 0.5 or 0.9?

Thanks for the great work. I have so much fun "catching" new birds.

euxoa commented 3 years ago

I stumbled into this only two days ago. With some data processing skills, you can import German names into the model/labels.txt file, if you have a list with matching scientific vs German names available. I did that for Finnish names, in R:

library(dplyr)

labels <- readr::read_delim("labels.txt", delim="_", col_names=c("name", "ename"))
transl <- readr::read_delim("../Maailman-lintujen-suomenkieliset-nimet-20180731.txt",
                delim="\t", locale=readr::locale(encoding="latin1"))
labels %>%
       left_join(transl %>% mutate(name=`Tieteellinen nimi`, fname=`Nimi suomeksi`)) %>%
       select(name, fname, ename) %>%
       mutate(cname=ifelse(is.na(fname), ename, fname)) %>%
       mutate(line=paste(name, cname, sep="_")) %>%
       { paste(.$line, collapse="\n")} %>% {writeLines(., con="labels2.txt") }

Just keep the format as name1_name2 for each line, and the positions of lines in the file intact.

What comes to reliability, I don't think you get complete reliability with any cutoff. I have used the default 0.1 and filtered out lines with reliability above 0.7 or 0.9 from the result file. Then if some of these are interesting, I look at the spectrogram in Audacity, and listen to it, to confirm or reject.

But the result file with cutoff of 0.1 is useful as well! For birds tend to have continuity, and typically the same bird appears in successive or almost successive frames. If you get something unexpected on the high-reliability list, look at the context around that frame in the file with cutoff=0.1. Often you find the correct id from the frames nearby (and it is a common bird).

Having been with this only a couple of nights, my workflow is definitely not final yet.

That said, I have already found interesting stuff from last year's records. Using BirdNET seems to have potential to speed up browsing of WAV files, and to reveal things that otherwise go unnoticed. Just don't take the ids at face value. They need to be confirmed somehow.

I think the identifications would be better if there were some continuity of scores over successive frames. It's not build into the current model, but maybe one can bring some of it there by post-processing. By looking at the source of analyze.py it would be best to take the whole score vector over several (say 3–10) frames and smooth it somehow, maybe by convolution in logit space. Or a Markov model. I just don't currently have a ground truth against which to optimise the required smoothness parameter. ;) A crude way is to manually look for identical labels in many subsequent frames. Have to think about this, although probably I wouldn't have time to code anything..

patlevin commented 3 years ago

If you are still interested, I have compiled a list of localised labels for 29 languages with varying levels of completeness:

Language	Missing labels	Missing labels (%)
Afrikaans	5774	90.76%
Catalan	544	8.55%
Chinese	264	4.15%
Chinese (Traditional)	295	4.64%
Croatian	370	5.82%
Czech	683	10.74%
Danish	460	7.23%
Dutch	264	4.15%
Estonian	3171	49.84%
Finnish	518	8.14%
French	264	4.15%
German	264	4.15%
Hungarian	2688	42.25%
Icelandic	5588	87.83%
Indonesian	5550	87.24%
Italian	524	8.24%
Japanese	640	10.06%
Latvian	4821	75.78%
Lithuanian	597	9.38%
Northern Sami	5605	88.10%
Norwegian	325	5.11%
Polish	265	4.17%
Portuguese	2742	43.10%
Russian	808	12.70%
Slovak	264	4.15%
Slovenian	5532	86.95%
Spanish	348	5.47%
Swedish	264	4.15%
Thai	5580	87.71%
Ukrainian	646	10.15%

My localisation database is missing 264 entries that are found in labels.txt hence no language has 0% missing entries. I have attached the localised labels (formatted as label_<ISO 639-1>.txt).

If there's any interest in this, I can create a pull-request as well.

UPDATED labels_l18n.zip

EDIT: the previous files didn't match the original labels.txt which lead to problems. The above link contains the fixed files.

nilspupils commented 3 years ago

That is great! Thank you for sharing!

ghost commented 3 years ago

Is it possible to get a full label.txt list for German with all entries ?

I tried the incomplete list and run into an error: see here --> https://github.com/kahst/BirdNET-Lite/issues/11 , while getting different results for the latin names.

patlevin commented 3 years ago

@Christoph-Lauer

Is it possible to get a full label.txt list for German with all entries ?

I tried the incomplete list and run into an error: see here --> #11 , while getting different results for the latin names.

I will regenerate the labels so they match the original labels.txt. The names are definitely correct, the order, however, might not be. I'll fix that right away.

ghost commented 3 years ago

Can confirm that the names are correct, but the number of lines different in both files. Would be happy to get a labels.txt file with the same number of lines as the TF model (6362 lines).

patlevin commented 3 years ago

@Christoph-Lauer I updated the archive in my comment above with the fixed files.

Also: korrigierte Liste mit deutschen Namen

Let me know if that fixes things, I'll redo the name mapping otherwise.

ghost commented 3 years ago

YOU MAKE MY DAY ;-)

nilspupils commented 3 years ago

Good news! Thanks @Christoph-Lauer for checking and @patlevin for setting up the new list!

DD4WH commented 2 years ago

Thanks a lot for the German Names species list! I made a few minor corrections and added a few names which were still in English (eg. some of the warblers).

labels_de.txt

nilspupils commented 2 years ago

This is a german language list of only the european species. Thanks to @DD4WH for his lists which i compiled to this one. Please check for errors as this was a really messy Excel job..... labels_de_europe.txt

kahst / BirdNET-Lite

Request: German species list #4