hongcui / FNATextProcessing

producing clean FNA input
1 stars 3 forks source link

Missing subspecies and some text in V23 #23

Open bibilujan opened 6 years ago

bibilujan commented 6 years ago

Issues with files 295, 296, 297, 564 and 565. Files are flagged as duplicated because of missing subspecies. In addition some information is missing. -297 and 296 are missing subspecies identification and the phenology, habitat, elevation and distribution information. -565 is missing the subspecies and phenology, habitat, elevation and distribution information.

I fixed these files manually in my copy of the files to be able to process the volume.

bibilujan commented 6 years ago

I discovered some errors with the keys of this volumes 2, 225, 458, 568, 594, 781, 827, 92, 93, 341, are missing blocks of text in the keys.

I have manually fixed those files, to be able to process the volume.

bibilujan commented 6 years ago

Here are some screenshots to describe the issues with the files missing subspecies.

  1. The edited xml with the correct information vs the old xml (V23/296.xml ) side-by-side-v23-296

  2. The eflora page with the missing information highlighted missing-text-296

Note: I also came across this issue in V26 (898.xml, 933.xml, 600.xml, 547.xml)

Beatriz