Open GoogleCodeExporter opened 8 years ago
Thanks Jonathan. I was sure we had too many homonyms in our taxonomy due to
extensive numbers in IRMNG, but I was not aware of some that come from
different sources. Pezizella for example comes from the catalog of life and
Index Fungorum and only really differs in their order and family name. That
should not have happened.
Btw, check out our new species pages in the upcoming new portal:
http://uat.gbif.org/species/5952828
http://uat.gbif.org/species/7245629
Do you have any good ideas how to discover real homonyms apart from keeping a
manual list as we do with IRMNG? We do not make use of authorship so far and in
some cases of the example you listed this would have told us clearly it is the
same name. But in general its pretty tough to deal with irregular authorship
spellings so we decided to not rely on that.
Also in case we cannot decide GBIF prefers to have duplicate taxa than merging
2 real ones into a single taxon. This causes more trouble for us than having a
false duplicate.
Original comment by wixner@gmail.com
on 19 Jul 2013 at 12:49
Original comment by wixner@gmail.com
on 19 Jul 2013 at 12:50
Pezizella is listed as a homonym in IRMNG and so does Index Fungorum:
Pezizella P. Karsten, 1872 for Thelebolus
GENUS SYNONYM from IRMNG Homonym List
Fungi Ascomycota Leotiomycetes Thelebolales Thelebolaceae
Pezizella Fuckel, 1870 for Calycina Nees ex Gray, 1821
GENUS SYNONYM from IRMNG Homonym List
Fungi Ascomycota Leotiomycetes Helotiales Hyaloscyphaceae Calycina
See http://bit.ly/1axaLhs
I am just surprised to see both of them being accepted in our backbone as
(most) sources list them as synonyms
Original comment by wixner@gmail.com
on 19 Jul 2013 at 2:08
Re "Do you have any good ideas how to discover real homonyms" - the question
is, when do two taxon records (from different sources, or not) refer to the
same taxon, and not. I spent quite a bit of time on this question and my code
for deciding this is based on what names occur in the ancestor chains of each,
and in the sets of descendants of each. Overlap among descendants is a pretty
good (but not 100% reliable) indicator, while having the parent of one occur in
the ancestor chain of the other is also pretty good. The code is in github but
is pretty much uncommented so probably wouldn't be of much use.
I understand about false matches being much more dangerous than false
mismatches; we have the same policy. However what's notable about the homonyms
I'm reporting here is that (a) they did not exist in the previous version of
your taxonomy (b) they are extreme in the sense of consisting of duplication of
a genus together with many species in it (e.g. 20 species in Phyllachora are
all duplicated together). This could be due either to changes in the source
taxonomies or a newly introduced bug in your merge algorithm. In either case I
would think the repair is to make your taxon identity detector more intelligent.
If you'd like to work together I'd be very happy to!
Original comment by jonathan...@gmail.com
on 19 Jul 2013 at 3:26
Tony has just published a new 3.1 version of IRMNG complete and the homonym
list. Importing now, gonna see if things have changed
Original comment by wixner@gmail.com
on 29 Aug 2013 at 3:29
Original issue reported on code.google.com by
jonathan...@gmail.com
on 17 Jul 2013 at 2:58