Open krummrey opened 3 years ago
I stumbled into this only two days ago. With some data processing skills, you can import German names into the model/labels.txt
file, if you have a list with matching scientific vs German names available. I did that for Finnish names, in R:
library(dplyr)
labels <- readr::read_delim("labels.txt", delim="_", col_names=c("name", "ename"))
transl <- readr::read_delim("../Maailman-lintujen-suomenkieliset-nimet-20180731.txt",
delim="\t", locale=readr::locale(encoding="latin1"))
labels %>%
left_join(transl %>% mutate(name=`Tieteellinen nimi`, fname=`Nimi suomeksi`)) %>%
select(name, fname, ename) %>%
mutate(cname=ifelse(is.na(fname), ename, fname)) %>%
mutate(line=paste(name, cname, sep="_")) %>%
{ paste(.$line, collapse="\n")} %>% {writeLines(., con="labels2.txt") }
Just keep the format as name1_name2
for each line, and the positions of lines in the file intact.
What comes to reliability, I don't think you get complete reliability with any cutoff. I have used the default 0.1 and filtered out lines with reliability above 0.7 or 0.9 from the result file. Then if some of these are interesting, I look at the spectrogram in Audacity, and listen to it, to confirm or reject.
But the result file with cutoff of 0.1 is useful as well! For birds tend to have continuity, and typically the same bird appears in successive or almost successive frames. If you get something unexpected on the high-reliability list, look at the context around that frame in the file with cutoff=0.1. Often you find the correct id from the frames nearby (and it is a common bird).
Having been with this only a couple of nights, my workflow is definitely not final yet.
That said, I have already found interesting stuff from last year's records. Using BirdNET seems to have potential to speed up browsing of WAV files, and to reveal things that otherwise go unnoticed. Just don't take the ids at face value. They need to be confirmed somehow.
I think the identifications would be better if there were some continuity of scores over successive frames. It's not build into the current model, but maybe one can bring some of it there by post-processing. By looking at the source of analyze.py
it would be best to take the whole score vector over several (say 3–10) frames and smooth it somehow, maybe by convolution in logit space. Or a Markov model. I just don't currently have a ground truth against which to optimise the required smoothness parameter. ;) A crude way is to manually look for identical labels in many subsequent frames. Have to think about this, although probably I wouldn't have time to code anything..
If you are still interested, I have compiled a list of localised labels for 29 languages with varying levels of completeness:
Language | Missing labels | Missing labels (%) |
---|---|---|
Afrikaans | 5774 | 90.76% |
Catalan | 544 | 8.55% |
Chinese | 264 | 4.15% |
Chinese (Traditional) | 295 | 4.64% |
Croatian | 370 | 5.82% |
Czech | 683 | 10.74% |
Danish | 460 | 7.23% |
Dutch | 264 | 4.15% |
Estonian | 3171 | 49.84% |
Finnish | 518 | 8.14% |
French | 264 | 4.15% |
German | 264 | 4.15% |
Hungarian | 2688 | 42.25% |
Icelandic | 5588 | 87.83% |
Indonesian | 5550 | 87.24% |
Italian | 524 | 8.24% |
Japanese | 640 | 10.06% |
Latvian | 4821 | 75.78% |
Lithuanian | 597 | 9.38% |
Northern Sami | 5605 | 88.10% |
Norwegian | 325 | 5.11% |
Polish | 265 | 4.17% |
Portuguese | 2742 | 43.10% |
Russian | 808 | 12.70% |
Slovak | 264 | 4.15% |
Slovenian | 5532 | 86.95% |
Spanish | 348 | 5.47% |
Swedish | 264 | 4.15% |
Thai | 5580 | 87.71% |
Ukrainian | 646 | 10.15% |
My localisation database is missing 264 entries that are found in labels.txt
hence no language has 0% missing entries.
I have attached the localised labels (formatted as label_<ISO 639-1>.txt).
If there's any interest in this, I can create a pull-request as well.
EDIT: the previous files didn't match the original labels.txt which lead to problems. The above link contains the fixed files.
That is great! Thank you for sharing!
Is it possible to get a full label.txt
list for German with all entries ?
I tried the incomplete list and run into an error: see here --> https://github.com/kahst/BirdNET-Lite/issues/11 , while getting different results for the latin names.
@Christoph-Lauer
Is it possible to get a full
label.txt
list for German with all entries ?I tried the incomplete list and run into an error: see here --> #11 , while getting different results for the latin names.
I will regenerate the labels so they match the original labels.txt
. The names are definitely correct, the order, however, might not be. I'll fix that right away.
Can confirm that the names are correct, but the number of lines different in both files. Would be happy to get a labels.txt
file with the same number of lines as the TF model (6362 lines).
@Christoph-Lauer I updated the archive in my comment above with the fixed files.
Also: korrigierte Liste mit deutschen Namen
Let me know if that fixes things, I'll redo the name mapping otherwise.
YOU MAKE MY DAY ;-)
Good news! Thanks @Christoph-Lauer for checking and @patlevin for setting up the new list!
Thanks a lot for the German Names species list! I made a few minor corrections and added a few names which were still in English (eg. some of the warblers).
This is a german language list of only the european species. Thanks to @DD4WH for his lists which i compiled to this one. Please check for errors as this was a really messy Excel job..... labels_de_europe.txt
I got BirdNET-Lite running and want to try to analyse a days worth of recording in my garden. The output generated is in latin and english. Do you have a german language labels.txt as in the phone app?
Also can you elaborate a little into the confidence values? Where would a "reliable" be ? at 0.5 or 0.9?
Thanks for the great work. I have so much fun "catching" new birds.