hongcui / FNATextProcessing

producing clean FNA input
1 stars 3 forks source link

text got cut off in statements in keys #17

Open hongcui opened 7 years ago

hongcui commented 7 years ago

The problems can be seen in V23, 594.xml. Not sure if all keys have a similar problem. Can't tell from the surface why some of the text got cut off. One way to detect the problem is to find all key statements that are not end with a '.'. Check against JSTOR version.

bibilujan commented 6 years ago

Hi, I am processing this volume (V23) with etc and I can see that this problem is still there. The files 2, 225, 458, 568, 594, 781, 827, 92, 93, 341 got flagged because of missing parenthesis but the problem is that some text is missing.

Beatriz

bibilujan commented 6 years ago

I ran into the same problem with V26. Some files get flagged by etc because of missing parenthesis, after looking into those files I can see that there is text cut off from the statements in keys. Some examples in V26 are: 393, 589, 95, 1082, 1202, 1126, 1148, 1180, 248... It seems like this is could possibly be an issue that occurs in all of the keys xmls.

For example, in some cases when the statement gets cut off, the statement will be cut off at a "-": Using grep I was able to find that this happens in volumes (23, 26, 4, 6, 7), possibly in others too.

Beatriz

(see attached text file for an example and the grep command)

Discovering_text_cutoff_keys.txt