Open GoogleCodeExporter opened 9 years ago
Original comment by timrobertson100
on 4 May 2011 at 1:34
After all the changes, this issue now stands as follows:
In Hive:
ROR_small: 1,000,000
OR: 996,908
This may still be valid, but needs inspection of the dropped records
Original comment by timrobertson100
on 5 May 2011 at 7:50
These are all hybrids.
E.g. scientific name
Bothus rhombus x maximus
Hybrids have not yet been handled in name parsing.
However, these records *should* not be dropped by identified to a higher taxon.
E.g. Occurrence record 53801 has:
Animalia
Chordata
Osteichthyes
Pleuronectiformes
Bothidae
Bothus
Bothus rhombus x maximus
It should be identified to Bothus even if the final name is not found.
The records are not getting an identification, and since the
occurrence_record.q has:
"JOIN ${occurrence_nub} nub ON r.id = nub.occurrence_id"
anything that is not identified somehow to the NUB, will be dropped.
Propose before handling Hybrids (which requires thought and advice from Andrea)
we implement the higher name matching which is necessary anyway for names not
in the nub. Suspect this change needs done in the nub lookup udf.
Original comment by timrobertson100
on 5 May 2011 at 8:02
Original issue reported on code.google.com by
timrobertson100
on 18 Apr 2011 at 6:38