jhpoelen / hmw

(experimental) Machine readable version of Handbook of the Mammals of the World
https://jhpoelen.nl/hmw/
Creative Commons Zero v1.0 Universal
5 stars 2 forks source link

Spelling errors during Plazi extraction #10

Closed ajacsherman closed 1 year ago

ajacsherman commented 1 year ago

@jhpoelen @flsimoes The following were spelling errors from the Plazi extraction process followed by their correct spelling and docID. Thanks for your help.

Plazi Correct   docID
Anthops omatus Anthops ornatus 4C3D87E8FFB46A0BFF7997F21FD8B734
Carollia brevicaudum Carollia brevicauda 03A687BCFF85FF841694F68BFAD5FB73
Chilonatalus tumaidifrons Chilonatalus tumidifrons 290787FFFFA71870FF0B9FF2E2B33232
Emballonura seni Emballonura serii 03D587F2FFC34C08FF0337D6FEFCF45F
Lophostoma silvicola Lophostoma silvicolum 03A687BCFFA4FFA413BDFC41FE2DFFD6
Mirimiri acrodonta duplicate in extraction    
Mops trevor Mops trevori 194287C9FF99BA36B1A5FEC2B010FD24
Pipustrellus aero Pipistrellus aero 4C3D87E8FFE96A57FA579E4F1C78B92A
Pipustrellus hanaki Pipistrellus hanaki 4C3D87E8FFF56A4BFA5B9FC01B2EBE6B
Pipustrellus minahassae Pipistrellus minahassae 4C3D87E8FFEC6A5CFA9B9F081B4DB830
Pipustrellus papuanus Pipistrellus papuanus 4C3D87E8FFE36A5CFF449CE81442BD54
Pipustrellus paterculus Pipistrellus paterculus 4C3D87E8FFEC6A53FF9092061C26B04E
Pipustrellus rusticus Pipistrellus rusticus 4C3D87E8FFEA6A56FA9392D01DB8B84F
Pipustrellus wattsi Pipistrellus wattsi 4C3D87E8FFE26A5DFF9294481FB2BCF9
Pipustrellus westralis Pipistrellus westralis 4C3D87E8FFE26A5DFF919E4D18B8BFBC
Plecotus christii Plecotus christiei 4C3D87E8FF976A28FF4C935C18A1BAE2
Pleralopex flanneryi Pteralopex flanneryi 03AD87FAFF83F66D896A3CFDF8FAF8AC
Rhinolophus cognotus Rhinolophus cognatus 885887A2FFEC8A0AFF06F534F239DC09
Rhinolophus deckend Rhinolophus deckenii 885887A2FFC18A26F89AF3F4F964D1A2
Tadarida latouchet Tadarida latouchei 194287C9FF8CBA20B482F5FEB6BEF840
Pleralopex atrata Pteralopex atrata 03AD87FAFF81F66C8C753E69FB55F62F
Rhinolophus odami Rhinolophus adami 885887A2FFCD8A2BFF63FD82FCF4D255
Manaopterus brachytragos Miniopterus brachytragos E84887F9FFC6D6480FC9FE281B1D31BC
Muniopterus mossambicus Miniopterus mossambicus E84887F9FFD8D6570F3EF4BD157B32AE
Rhinolophus comutus Rhinolophus cornutus 885887A2FFD08A09F8B3F3E5FDDDD522
Thainycleris torquatus Thainycteris torquata 4C3D87E8FFBF6A01FA569C0B1D3FBF02
flsimoes commented 1 year ago

Thanks @ajacsherman We'll check these out

jhpoelen commented 1 year ago

@ajacsherman thanks for sharing your detailed feedback. Curious to see what @flsimoes comes up with.

flsimoes commented 1 year ago

Just from the list itself I can see them being OCR misreads

ajacsherman commented 1 year ago

I found another. Feumops floridanus/Eumops floridanus/194287C9FFB0BA1DB181F02CB1CDFE3E

flsimoes commented 1 year ago
Plazi | Correct |   | docID | status -- | -- | -- | -- | -- Anthops omatus | Anthops | ornatus | 4C3D87E8FFB46A0BFF7997F21FD8B734 | see comment below Carollia brevicaudum | Carollia | brevicauda | 03A687BCFF85FF841694F68BFAD5FB73 | see comment below Chilonatalus tumaidifrons | Chilonatalus | tumidifrons | 290787FFFFA71870FF0B9FF2E2B33232 | fixed Emballonura seni | Emballonura | serii | 03D587F2FFC34C08FF0337D6FEFCF45F | fixed Lophostoma silvicola | Lophostoma | silvicolum | 03A687BCFFA4FFA413BDFC41FE2DFFD6 | see comment below Mirimiri acrodonta | duplicate in extraction |   |   | see comment below Mops trevor | Mops | trevori | 194287C9FF99BA36B1A5FEC2B010FD24 | fixed Pipustrellus aero | Pipistrellus | aero | 4C3D87E8FFE96A57FA579E4F1C78B92A | fixed Pipustrellus hanaki | Pipistrellus | hanaki | 4C3D87E8FFF56A4BFA5B9FC01B2EBE6B | fixed Pipustrellus minahassae | Pipistrellus | minahassae | 4C3D87E8FFEC6A5CFA9B9F081B4DB830 | fixed Pipustrellus papuanus | Pipistrellus | papuanus | 4C3D87E8FFE36A5CFF449CE81442BD54 | fixed Pipustrellus paterculus | Pipistrellus | paterculus | 4C3D87E8FFEC6A53FF9092061C26B04E | fixed Pipustrellus rusticus | Pipistrellus | rusticus | 4C3D87E8FFEA6A56FA9392D01DB8B84F | fixed Pipustrellus wattsi | Pipistrellus | wattsi | 4C3D87E8FFE26A5DFF9294481FB2BCF9 | fixed Pipustrellus westralis | Pipistrellus | westralis | 4C3D87E8FFE26A5DFF919E4D18B8BFBC | fixed Plecotus christii | Plecotus | christiei | 4C3D87E8FF976A28FF4C935C18A1BAE2 | see comment below Pleralopex flanneryi | Pteralopex | flanneryi | 03AD87FAFF83F66D896A3CFDF8FAF8AC | fixed Rhinolophus cognotus | Rhinolophus | cognatus | 885887A2FFEC8A0AFF06F534F239DC09 | fixed Rhinolophus deckend | Rhinolophus | deckenii | 885887A2FFC18A26F89AF3F4F964D1A2 | fixed Tadarida latouchet | Tadarida | latouchei | 194287C9FF8CBA20B482F5FEB6BEF840 | reads _latouchei_, didn't find the error Pleralopex atrata | Pteralopex | atrata | 03AD87FAFF81F66C8C753E69FB55F62F | fixed Rhinolophus odami | Rhinolophus | adami | 885887A2FFCD8A2BFF63FD82FCF4D255 | fixed Manaopterus brachytragos | Miniopterus | brachytragos | E84887F9FFC6D6480FC9FE281B1D31BC | fixed Muniopterus mossambicus | Miniopterus | mossambicus | E84887F9FFD8D6570F3EF4BD157B32AE | fixed Rhinolophus comutus | Rhinolophus | cornutus | 885887A2FFD08A09F8B3F3E5FDDDD522 | fixed Thainycleris torquatus | Thainycteris | torquata | 4C3D87E8FFBF6A01FA569C0B1D3FBF02 | fixed*, see comment below Feumops floridanus | Eumops | floridanus | 194287C9FFB0BA1DB181F02CB1CDFE3E | fixed
flsimoes commented 1 year ago

@ajacsherman for Lophostoma silvicolum, the hmw actually calls it silvicola image

and checking the GBIF database I can see it as such https://www.gbif.org/species/5706810

I understand that this has been updated, but we keep the name written in the original document.

EDIT: same applies to Carollia brevicaudum EDIT 2: and to Thainycteris torquatus

flsimoes commented 1 year ago

@ajacsherman You listed Anthops ornatus, but the HMW calls it Scotomanes ornatus image If this is a posterior synonymy thant it fits into the same category as https://github.com/jhpoelen/hmw/issues/10#issuecomment-1238503550

flsimoes commented 1 year ago

Plecotus christii Perhaps it is a typo from the HMW itself? If so, than it is what is is and should be cited as sic image

jhpoelen commented 1 year ago

@flsimoes @ajacsherman very neat that we are getting into the details here. I am hoping to discuss ways to systematically capture the name relations, so that we can easily traverse them and infer that the relations between the names.

In this case, though, it appears that batnames knows about both names.

@ajacsherman can you help figure out what is going on?

$ echo -e "\tAnthops ornatus" | nomer append batnames
[main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [batnames]
    Anthops ornatus HAS_ACCEPTED_NAME   https://batnames.org/species/Anthops%20ornatus  Anthops ornatus     Flower-faced Bat @en            https://batnames.org/species/Anthops%20ornatus  

but

$ echo -e "\tScotomanes ornatus" | nomer append batnames
[main] INFO org.globalbioticinteractions.nomer.match.TermMatcherRegistry - using matcher [batnames]
    Scotomanes ornatus  HAS_ACCEPTED_NAME   https://batnames.org/species/Scotomanes%20ornatus   Scotomanes ornatus      Harlequin Bat @en               https://batnames.org/species/Scotomanes%20ornatus   
flsimoes commented 1 year ago

@ajacsherman Not sure what to do with Mirimiri acrodonta Do you have links to the duplicates?

ajacsherman commented 1 year ago

Found a few more... Manaopterus brachytragos/Miniopterus brachytragos/p.707/E84887F9FFC6D6480FC9FE281B1D31BC Muniopterus mossambicus/Miniopterus mossambicus/p.706/E84887F9FFD8D6570F3EF4BD157B32AE Mucronycteris schmidtorum/p.491/03A687BCFFB6FFB616BFFD4EF81DF856 Thainycleris torquatus/Thainycteris torquata/p.826/4C3D87E8FFBF6A01FA569C0B1D3FBF02

ajacsherman commented 1 year ago

@ajacsherman Not sure what to do with Mirimiri acrodonta Do you have links to the duplicates? I found where the transcription error occurred. It was on my end. Thanks.

flsimoes commented 1 year ago

Found a few more... Manaopterus brachytragos/Miniopterus brachytragos/p.707/E84887F9FFC6D6480FC9FE281B1D31BC Muniopterus mossambicus/Miniopterus mossambicus/p.706/E84887F9FFD8D6570F3EF4BD157B32AE Mucronycteris schmidtorum/p.491/03A687BCFFB6FFB616BFFD4EF81DF856 Thainycleris torquatus/Thainycteris torquata/p.826/4C3D87E8FFBF6A01FA569C0B1D3FBF02

Fixed Mucronycteris schmidtorum. Manaopterus brachytragos, Muniopterus mossambicus and Thainycleris torquatus were already reported and were either fixed or commented above

jhpoelen commented 1 year ago

@flsimoes @ajacsherman thanks for all the hard work on this. Please let me know when is a good time to generate a new version of hmw.csv .

flsimoes commented 1 year ago

@flsimoes @ajacsherman thanks for all the hard work on this. Please let me know when is a good time to generate a new version of hmw.csv .

I think all of the names that were listed here and that we could fix, have been fixed.

jhpoelen commented 1 year ago

as requested by @flsimoes , I've prepared a new version of hmw.csv and derived data products.

Please review the latest versions at:

https://github.com/jhpoelen/hmw/releases/tag/0.4

jhpoelen commented 1 year ago

@ajacsherman @flsimoes I am assuming that you've reviewed version https://github.com/jhpoelen/hmw/releases/tag/0.4 and that all expected fixes are present.

Closing issue. Please feel free to comment / re-open if issue remain.

flsimoes commented 1 year ago

@ajacsherman @flsimoes I am assuming that you've reviewed version https://github.com/jhpoelen/hmw/releases/tag/0.4 and that all expected fixes are present.

They should all be fixed, yes (apart from those where we have missing pages).

jhpoelen commented 1 year ago

@flsimoes thanks for confirming.

@ajacsherman can you confirm that the issues that you documented above has been addressed in v0.4 of the https://github.com/jhpoelen/hmw ?