openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
169 stars 79 forks source link

XML-hul 1.5[tested .1 and .2] thows StackOverflowError when validating XML with (yet) undetermined characteristics #844

Open UmbrellaDish opened 1 year ago

UmbrellaDish commented 1 year ago

When validating some XML files, JHOVE croaks with following message and error code (rc != 0):

Exception in thread "main" java.lang.StackOverflowError
    at org.apache.xerces.impl.xpath.regex.RangeToken.match(Unknown Source)
    at org.apache.xerces.impl.xpath.regex.RegularExpression.matchString(Unknown Source)
        [last line repeats 1023 times.]

You can reproduce the error by validating any of the files listed below from this publicly available dataset. XML files do not seem to be supported as attachment to an issue, sorry; the ZIP takes 1G in size. So, I include the content of a small XML sample proven croaking JHOVE:

<?xml version="1.0" encoding="UTF-8"?>
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2016-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2016-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2016-07-15/pagecontent.xsd">
    <Metadata>
    <Creator></Creator>
    <Created>2017-05-03T12:29:08</Created>
    <LastChange>2017-05-31T08:20:28</LastChange></Metadata>
    <Page imageFilename="vt1712_Schoolbook_0009.tif" imageWidth="2304" imageHeight="3463">
    <AlternativeImage filename="vt1712_schoolbook_0009_b.tif" comments="B/W"/>
    <TextRegion id="r0" type="heading">
    <Coords points="1062,518 1062,519 1064,519 1064,520 1067,520 1067,521 1069,521 1069,522 1070,522 1070,523 1072,523 1072,529 1079,529 1079,530 1080,530 1080,535 1093,535 1093,536 1106,536 1106,537 1111,537 1111,538 1112,538 1112,539 1113,539 1113,585 1120,585 1120,586 1122,586 1122,587 1123,587 1123,590 1124,590 1124,591 1125,591 1125,594 1124,594 1124,596 1123,596 1123,597 1121,597 1121,598 1106,598 1106,599 1105,599 1105,601 1024,601 1024,606 1023,606 1023,607 1022,607 1022,608 1016,608 1016,603 953,603 953,602 951,602 951,596 897,596 897,595 896,595 896,594 895,594 895,592 894,592 894,591 893,591 893,589 892,589 892,586 891,586 891,580 864,580 864,579 862,579 862,578 861,578 861,577 860,577 860,574 859,574 859,573 858,573 858,563 854,563 854,562 851,562 851,561 850,561 850,560 848,560 848,558 847,558 847,549 848,549 848,547 849,547 849,546 850,546 850,545 851,545 851,544 852,544 852,543 853,543 853,542 854,542 854,541 855,541 855,540 856,540 856,539 858,539 858,538 859,538 859,537 861,537 861,536 866,536 866,535 890,535 890,534 891,534 891,531 948,531 948,530 1038,530 1038,528 1039,528 1039,527 1040,527 1040,526 1042,526 1042,525 1043,525 1043,524 1045,524 1045,523 1046,523 1046,522 1048,522 1048,521 1050,521 1050,520 1053,520 1053,519 1055,519 1055,518"/>
    <TextEquiv>
    <Unicode>পশ্বাবলি.</Unicode></TextEquiv></TextRegion>
    <SeparatorRegion id="r1">
    <Coords points="968,665 968,666 970,666 970,667 974,667 974,668 976,668 976,669 979,669 979,670 981,670 981,671 982,671 982,672 985,672 985,673 987,673 987,674 992,674 992,675 997,675 997,676 1008,676 1008,677 1046,677 1046,678 1048,678 1048,679 1049,679 1049,682 989,682 989,683 984,683 984,684 981,684 981,685 978,685 978,686 977,686 977,687 974,687 974,688 971,688 971,689 970,689 970,690 968,690 968,691 963,691 963,690 962,690 962,689 961,689 961,688 957,688 957,687 954,687 954,686 953,686 953,685 951,685 951,684 948,684 948,683 944,683 944,682 940,682 940,681 936,681 936,680 931,680 931,679 926,679 926,678 887,678 887,673 888,673 888,672 947,672 947,671 952,671 952,670 955,670 955,669 956,669 956,668 957,668 957,667 959,667 959,666 961,666 961,665"/></SeparatorRegion>
    <TextRegion id="r3" type="paragraph">
    <Coords points="899,734 899,735 903,735 903,736 906,736 906,737 909,737 909,738 910,738 910,739 911,739 911,740 912,740 912,741 913,741 913,743 914,743 914,746 968,746 968,747 969,747 969,748 1078,748 1078,745 1124,745 1124,743 1127,743 1127,744 1128,744 1128,773 1143,773 1143,774 1144,774 1144,775 1145,775 1145,846 1169,846 1169,847 1170,847 1170,848 1171,848 1171,859 1172,859 1172,862 1173,862 1173,863 1174,863 1174,892 1184,892 1184,893 1185,893 1185,894 1186,894 1186,895 1187,895 1187,903 1185,903 1185,904 1184,904 1184,905 1124,905 1124,914 1123,914 1123,915 1075,915 1075,914 1074,914 1074,913 941,913 941,912 940,912 940,907 939,907 939,903 826,903 826,916 824,916 824,917 817,917 817,916 816,916 816,912 815,912 815,910 814,910 814,909 813,909 813,908 812,908 812,906 811,906 811,904 810,904 810,902 808,902 808,901 807,901 807,900 806,900 806,898 805,898 805,897 804,897 804,896 803,896 803,895 802,895 802,893 800,893 800,892 797,892 797,891 796,891 796,890 794,890 794,889 793,889 793,888 791,888 791,887 790,887 790,886 788,886 788,885 787,885 787,882 786,882 786,881 785,881 785,878 784,878 784,854 780,854 780,853 779,853 779,846 780,846 780,844 781,844 781,843 817,843 817,748 818,748 818,746 819,746 819,745 831,745 831,744 888,744 888,738 889,738 889,736 891,736 891,735 893,735 893,734"/>
    <TextEquiv>
    <Unicode>তৃতীয় সংখ্যা.
হস্তির বিবরণ.</Unicode></TextEquiv></TextRegion>
    <ImageRegion id="r5">
    <Coords points="783,971 783,972 818,972 818,973 853,973 853,974 857,974 857,977 931,977 931,978 932,978 932,980 937,980 937,981 954,981 954,982 965,982 965,983 971,983 971,984 975,984 975,985 977,985 977,986 978,986 978,987 984,987 984,988 986,988 986,989 989,989 989,990 991,990 991,991 992,991 992,993 999,993 999,994 1001,994 1001,995 1002,995 1002,996 1005,996 1005,997 1008,997 1008,998 1194,998 1194,997 1195,997 1195,996 1196,996 1196,995 1197,995 1197,994 1199,994 1199,993 1201,993 1201,992 1203,992 1203,991 1204,991 1204,990 1205,990 1205,989 1206,989 1206,988 1209,988 1209,987 1211,987 1211,986 1212,986 1212,985 1216,985 1216,984 1218,984 1218,983 1219,983 1219,982 1222,982 1222,981 1233,981 1233,982 1237,982 1237,983 1262,983 1262,984 1266,984 1266,985 1267,985 1267,986 1269,986 1269,987 1271,987 1271,988 1273,988 1273,989 1274,989 1274,990 1277,990 1277,991 1278,991 1278,992 1283,992 1283,993 1291,993 1291,994 1294,994 1294,995 1298,995 1298,996 1299,996 1299,997 1304,997 1304,998 1305,998 1305,999 1307,999 1307,1000 1308,1000 1308,1001 1309,1001 1309,1002 1311,1002 1311,1003 1312,1003 1312,1004 1314,1004 1314,1005 1315,1005 1315,1007 1316,1007 1316,1008 1317,1008 1317,1009 1318,1009 1318,1010 1319,1010 1319,1011 1320,1011 1320,1013 1321,1013 1321,1015 1323,1015 1323,1017 1324,1017 1324,1018 1325,1018 1325,1020 1326,1020 1326,1021 1327,1021 1327,1023 1328,1023 1328,1026 1329,1026 1329,1027 1330,1027 1330,1028 1331,1028 1331,1030 1332,1030 1332,1033 1333,1033 1333,1035 1334,1035 1334,1036 1335,1036 1335,1038 1336,1038 1336,1039 1337,1039 1337,1040 1338,1040 1338,1042 1339,1042 1339,1044 1340,1044 1340,1045 1341,1045 1341,1047 1342,1047 1342,1048 1343,1048 1343,1049 1344,1049 1344,1050 1345,1050 1345,1052 1346,1052 1346,1053 1347,1053 1347,1054 1348,1054 1348,1056 1349,1056 1349,1058 1350,1058 1350,1059 1354,1059 1354,1066 1355,1066 1355,1069 1356,1069 1356,1070 1357,1070 1357,1076 1358,1076 1358,1079 1359,1079 1359,1081 1360,1081 1360,1086 1361,1086 1361,1087 1362,1087 1362,1088 1363,1088 1363,1090 1364,1090 1364,1092 1365,1092 1365,1093 1366,1093 1366,1095 1367,1095 1367,1096 1368,1096 1368,1097 1369,1097 1369,1098 1370,1098 1370,1099 1371,1099 1371,1100 1372,1100 1372,1101 1373,1101 1373,1102 1374,1102 1374,1104 1375,1104 1375,1105 1377,1105 1377,1107 1378,1107 1378,1108 1379,1108 1379,1110 1380,1110 1380,1111 1381,1111 1381,1114 1382,1114 1382,1115 1384,1115 1384,1117 1385,1117 1385,1118 1386,1118 1386,1122 1387,1122 1387,1124 1388,1124 1388,1130 1389,1130 1389,1132 1390,1132 1390,1134 1391,1134 1391,1141 1392,1141 1392,1143 1393,1143 1393,1144 1394,1144 1394,1147 1395,1147 1395,1149 1396,1149 1396,1150 1397,1150 1397,1153 1398,1153 1398,1154 1402,1154 1402,1165 1403,1165 1403,1193 1404,1193 1404,1276 1405,1276 1405,1279 1406,1279 1406,1282 1407,1282 1407,1296 1408,1296 1408,1301 1409,1301 1409,1314 1410,1314 1410,1350 1411,1350 1411,1351 1413,1351 1413,1352 1418,1352 1418,1353 1420,1353 1420,1354 1421,1354 1421,1355 1422,1355 1422,1356 1427,1356 1427,1357 1428,1357 1428,1358 1429,1358 1429,1359 1431,1359 1431,1360 1433,1360 1433,1361 1437,1361 1437,1362 1438,1362 1438,1363 1441,1363 1441,1364 1444,1364 1444,1365 1446,1365 1446,1366 1449,1366 1449,1367 1454,1367 1454,1368 1456,1368 1456,1369 1459,1369 1459,1370 1464,1370 1464,1371 1467,1371 1467,1372 1470,1372 1470,1373 1472,1373 1472,1374 1475,1374 1475,1375 1479,1375 1479,1376 1481,1376 1481,1377 1483,1377 1483,1378 1491,1378 1491,1379 1499,1379 1499,1380 1544,1380 1544,1381 1546,1381 1546,1383 1547,1383 1547,1425 1555,1425 1555,1426 1557,1426 1557,1524 1562,1524 1562,1525 1563,1525 1563,1527 1564,1527 1564,1533 1566,1533 1566,1534 1567,1534 1567,1538 1570,1538 1570,1540 1571,1540 1571,1555 1579,1555 1579,1557 1580,1557 1580,1558 1584,1558 1584,1559 1585,1559 1585,1573 1586,1573 1586,1574 1588,1574 1588,1580 1604,1580 1604,1581 1605,1581 1605,1583 1606,1583 1606,1586 1607,1586 1607,1588 1608,1588 1608,1592 1609,1592 1609,1593 1610,1593 1610,1596 1611,1596 1611,1597 1612,1597 1612,1663 1615,1663 1615,1664 1616,1664 1616,1665 1617,1665 1617,1666 1618,1666 1618,1668 1619,1668 1619,1680 1620,1680 1620,1683 1619,1683 1619,1684 1594,1684 1594,1693 1593,1693 1593,1694 1587,1694 1587,1695 1582,1695 1582,1711 1581,1711 1581,1712 1580,1712 1580,1713 1577,1713 1577,1719 1566,1719 1566,1726 1565,1726 1565,1727 1558,1727 1558,1730 1557,1730 1557,1731 1556,1731 1556,1732 1554,1732 1554,1737 1553,1737 1553,1738 1547,1738 1547,1742 1544,1742 1544,1743 1543,1743 1543,1744 1530,1744 1530,1748 1529,1748 1529,1749 1519,1749 1519,1755 1506,1755 1506,1772 1504,1772 1504,1773 1493,1773 1493,1774 1492,1774 1492,1777 1491,1777 1491,1778 1488,1778 1488,1784 1481,1784 1481,1792 1480,1792 1480,1793 1457,1793 1457,1818 1429,1818 1429,1819 1417,1819 1417,1824 1408,1824 1408,1831 1407,1831 1407,1832 1396,1832 1396,1833 1386,1833 1386,1834 1381,1834 1381,1833 1357,1833 1357,1832 1333,1832 1333,1831 1305,1831 1305,1829 1181,1829 1181,1830 1160,1830 1160,1829 1159,1829 1159,1824 1117,1824 1117,1823 1113,1823 1113,1822 1110,1822 1110,1821 1105,1821 1105,1815 1089,1815 1089,1814 976,1814 976,1816 973,1816 973,1814 847,1814 847,1820 821,1820 821,1825 757,1825 757,1824 692,1824 692,1823 689,1823 689,1822 687,1822 687,1821 686,1821 686,1820 665,1820 665,1819 629,1819 629,1818 628,1818 628,1817 626,1817 626,1816 621,1816 621,1815 617,1815 617,1814 615,1814 615,1813 612,1813 612,1812 609,1812 609,1811 595,1811 595,1810 594,1810 594,1803 547,1803 547,1802 546,1802 546,1801 544,1801 544,1800 543,1800 543,1786 522,1786 522,1785 521,1785 521,1784 520,1784 520,1770 498,1770 498,1758 493,1758 493,1727 472,1727 472,1726 470,1726 470,1725 469,1725 469,1723 468,1723 468,1722 467,1722 467,1714 452,1714 452,1707 451,1707 451,1706 449,1706 449,1705 435,1705 435,1704 433,1704 433,1699 405,1699 405,1673 404,1673 404,1672 402,1672 402,1671 400,1671 400,1670 390,1670 390,1669 389,1669 389,1667 390,1667 390,1665 391,1665 391,1564 390,1564 390,1562 389,1562 389,1558 390,1558 390,1556 391,1556 391,1543 392,1543 392,1541 393,1541 393,1539 394,1539 394,1537 395,1537 395,1536 396,1536 396,1535 397,1535 397,1534 398,1534 398,1533 400,1533 400,1532 402,1532 402,1531 403,1531 403,1530 405,1530 405,1529 408,1529 408,1528 411,1528 411,1527 414,1527 414,1526 415,1526 415,1525 416,1525 416,1524 417,1524 417,1522 418,1522 418,1521 419,1521 419,1519 420,1519 420,1518 421,1518 421,1516 422,1516 422,1490 423,1490 423,1389 421,1389 421,1386 420,1386 420,1382 418,1382 418,1380 417,1380 417,1377 416,1377 416,1368 415,1368 415,1362 414,1362 414,1359 413,1359 413,1347 412,1347 412,1341 411,1341 411,1338 410,1338 410,1315 411,1315 411,1303 412,1303 412,1300 413,1300 413,1299 414,1299 414,1296 415,1296 415,1293 416,1293 416,1291 417,1291 417,1290 418,1290 418,1266 419,1266 419,1261 420,1261 420,1254 421,1254 421,1245 422,1245 422,1243 423,1243 423,1242 424,1242 424,1238 425,1238 425,1232 426,1232 426,1230 427,1230 427,1227 428,1227 428,1225 429,1225 429,1223 430,1223 430,1221 431,1221 431,1220 432,1220 432,1218 433,1218 433,1217 434,1217 434,1216 435,1216 435,1215 437,1215 437,1214 438,1214 438,1213 439,1213 439,1212 440,1212 440,1211 441,1211 441,1210 442,1210 442,1209 444,1209 444,1198 445,1198 445,1192 446,1192 446,1189 447,1189 447,1187 448,1187 448,1185 449,1185 449,1183 450,1183 450,1181 451,1181 451,1180 452,1180 452,1177 453,1177 453,1176 454,1176 454,1175 455,1175 455,1174 456,1174 456,1173 457,1173 457,1171 459,1171 459,1170 463,1170 463,1168 464,1168 464,1166 466,1166 466,1165 467,1165 467,1164 470,1164 470,1163 471,1163 471,1162 472,1162 472,1154 473,1154 473,1152 474,1152 474,1150 475,1150 475,1149 476,1149 476,1148 477,1148 477,1147 479,1147 479,1146 480,1146 480,1145 482,1145 482,1144 483,1144 483,1143 486,1143 486,1142 489,1142 489,1141 493,1141 493,1140 495,1140 495,1139 496,1139 496,1138 497,1138 497,1136 498,1136 498,1133 499,1133 499,1129 501,1129 501,1128 502,1128 502,1126 503,1126 503,1125 504,1125 504,1124 505,1124 505,1118 506,1118 506,1117 507,1117 507,1115 508,1115 508,1113 509,1113 509,1110 510,1110 510,1107 511,1107 511,1105 512,1105 512,1103 513,1103 513,1102 514,1102 514,1100 515,1100 515,1098 516,1098 516,1096 517,1096 517,1095 518,1095 518,1094 519,1094 519,1093 520,1093 520,1092 521,1092 521,1091 522,1091 522,1090 523,1090 523,1088 524,1088 524,1087 525,1087 525,1086 526,1086 526,1085 529,1085 529,1084 530,1084 530,1083 533,1083 533,1082 534,1082 534,1079 535,1079 535,1078 536,1078 536,1077 537,1077 537,1076 538,1076 538,1075 540,1075 540,1072 541,1072 541,1071 542,1071 542,1070 543,1070 543,1069 544,1069 544,1067 545,1067 545,1066 546,1066 546,1065 547,1065 547,1061 548,1061 548,1059 549,1059 549,1058 550,1058 550,1057 552,1057 552,1056 553,1056 553,1054 554,1054 554,1053 555,1053 555,1052 556,1052 556,1051 557,1051 557,1050 558,1050 558,1049 560,1049 560,1048 562,1048 562,1047 563,1047 563,1046 566,1046 566,1045 567,1045 567,1040 568,1040 568,1038 569,1038 569,1037 570,1037 570,1036 572,1036 572,1035 573,1035 573,1034 574,1034 574,1033 575,1033 575,1032 577,1032 577,1031 579,1031 579,1030 580,1030 580,1029 581,1029 581,1028 582,1028 582,1027 584,1027 584,1026 586,1026 586,1025 588,1025 588,1024 591,1024 591,1023 592,1023 592,1022 593,1022 593,1021 595,1021 595,1020 597,1020 597,1019 601,1019 601,1018 605,1018 605,1017 612,1017 612,1016 613,1016 613,1015 617,1015 617,1014 620,1014 620,1013 622,1013 622,1012 626,1012 626,1011 627,1011 627,1010 637,1010 637,1009 640,1009 640,1008 643,1008 643,1007 647,1007 647,1006 651,1006 651,1005 653,1005 653,1004 656,1004 656,1002 658,1002 658,1001 662,1001 662,1000 664,1000 664,999 666,999 666,998 667,998 667,997 671,997 671,996 672,996 672,995 674,995 674,994 675,994 675,993 676,993 676,992 677,992 677,991 678,991 678,990 681,990 681,989 683,989 683,988 684,988 684,987 686,987 686,986 689,986 689,985 697,985 697,984 699,984 699,983 706,983 706,982 708,982 708,981 714,981 714,980 718,980 718,979 723,979 723,978 725,978 725,977 730,977 730,976 734,976 734,975 742,975 742,974 748,974 748,973 754,973 754,972 780,972 780,971"/></ImageRegion>
    <TextRegion id="r6" type="paragraph">
    <Coords points="909,1854 909,1855 1041,1855 1041,1854 1046,1854 1046,1855 1047,1855 1047,1863 1053,1863 1053,1864 1099,1864 1099,1865 1100,1865 1100,1866 1101,1866 1101,1890 1112,1890 1112,1891 1114,1891 1114,1892 1115,1892 1115,1894 1116,1894 1116,1952 1153,1952 1153,1953 1156,1953 1156,1954 1158,1954 1158,1955 1159,1955 1159,1956 1161,1956 1161,1957 1162,1957 1162,1958 1163,1958 1163,1961 1263,1961 1263,1960 1264,1960 1264,1959 1267,1959 1267,1958 1270,1958 1270,1957 1273,1957 1273,1956 1274,1956 1274,1955 1276,1955 1276,1954 1280,1954 1280,1955 1283,1955 1283,1956 1285,1956 1285,1957 1286,1957 1286,1958 1287,1958 1287,1966 1296,1966 1296,1967 1330,1967 1330,1968 1331,1968 1331,1969 1375,1969 1375,1970 1377,1970 1377,1971 1382,1971 1382,1972 1414,1972 1414,1974 1415,1974 1415,2005 1430,2005 1430,2006 1432,2006 1432,2009 1433,2009 1433,2082 1480,2082 1480,2083 1482,2083 1482,2084 1484,2084 1484,2085 1485,2085 1485,2086 1486,2086 1486,2087 1487,2087 1487,2089 1488,2089 1488,2090 1490,2090 1490,2091 1492,2091 1492,2092 1493,2092 1493,2096 1524,2096 1524,2097 1644,2097 1644,2094 1645,2094 1645,2093 1646,2093 1646,2092 1650,2092 1650,2093 1651,2093 1651,2094 1652,2094 1652,2096 1653,2096 1653,2098 1655,2098 1655,2099 1659,2099 1659,2100 1660,2100 1660,2101 1661,2101 1661,2105 1660,2105 1660,2106 1657,2106 1657,2107 1656,2107 1656,2111 1651,2111 1651,2139 1650,2139 1650,2140 1648,2140 1648,2257 1649,2257 1649,2318 1651,2318 1651,2319 1653,2319 1653,2399 1654,2399 1654,2404 1653,2404 1653,2405 1651,2405 1651,2504 1650,2504 1650,2505 1640,2505 1640,2551 1634,2551 1634,2550 1633,2550 1633,2549 1632,2549 1632,2548 1630,2548 1630,2547 1628,2547 1628,2546 1627,2546 1627,2545 1626,2545 1626,2544 1624,2544 1624,2542 1623,2542 1623,2541 1622,2541 1622,2540 1621,2540 1621,2539 1470,2539 1470,2556 1469,2556 1469,2557 1467,2557 1467,2556 1466,2556 1466,2555 1464,2555 1464,2554 1463,2554 1463,2552 1461,2552 1461,2551 1460,2551 1460,2550 1448,2550 1448,2549 1447,2549 1447,2548 1446,2548 1446,2547 1445,2547 1445,2534 1318,2534 1318,2544 1314,2544 1314,2543 1312,2543 1312,2542 1311,2542 1311,2541 1310,2541 1310,2540 1309,2540 1309,2539 1308,2539 1308,2538 1306,2538 1306,2537 1269,2537 1269,2536 1268,2536 1268,2535 1267,2535 1267,2534 1266,2534 1266,2533 1265,2533 1265,2532 1264,2532 1264,2531 1263,2531 1263,2530 1159,2530 1159,2531 1158,2531 1158,2533 1155,2533 1155,2534 1132,2534 1132,2535 1078,2535 1078,2538 1074,2538 1074,2537 1073,2537 1073,2536 1072,2536 1072,2534 1071,2534 1071,2533 1069,2533 1069,2532 1068,2532 1068,2531 1067,2531 1067,2530 1065,2530 1065,2529 1064,2529 1064,2528 930,2528 930,2677 958,2677 958,2678 1022,2678 1022,2679 1024,2679 1024,2680 1025,2680 1025,2681 1026,2681 1026,2682 1028,2682 1028,2683 1029,2683 1029,2684 1030,2684 1030,2692 1050,2692 1050,2693 1091,2693 1091,2694 1224,2694 1224,2695 1266,2695 1266,2696 1426,2696 1426,2691 1427,2691 1427,2690 1433,2690 1433,2691 1441,2691 1441,2692 1442,2692 1442,2694 1542,2694 1542,2693 1546,2693 1546,2695 1547,2695 1547,2700 1647,2700 1647,2699 1650,2699 1650,2700 1651,2700 1651,2738 1652,2738 1652,2807 1653,2807 1653,2808 1654,2808 1654,2810 1655,2810 1655,2812 1654,2812 1654,2914 1651,2914 1651,2946 1649,2946 1649,2947 1635,2947 1635,2948 1634,2948 1634,2949 1633,2949 1633,2950 1632,2950 1632,2951 1596,2951 1596,2954 1589,2954 1589,2953 1588,2953 1588,2952 1587,2952 1587,2951 1586,2951 1586,2949 1584,2949 1584,2948 1583,2948 1583,2947 1582,2947 1582,2946 1581,2946 1581,2945 1580,2945 1580,2944 1546,2944 1546,2943 1529,2943 1529,2942 1420,2942 1420,2943 1419,2943 1419,2944 1418,2944 1418,2945 1384,2945 1384,2949 1383,2949 1383,2950 1380,2950 1380,2949 1378,2949 1378,2947 1377,2947 1377,2946 1376,2946 1376,2945 1375,2945 1375,2944 1374,2944 1374,2943 1373,2943 1373,2942 1371,2942 1371,2941 1370,2941 1370,2940 1336,2940 1336,2939 1248,2939 1248,2938 1247,2938 1247,2936 1126,2936 1126,2937 1120,2937 1120,2936 1119,2936 1119,2935 1096,2935 1096,2932 1040,2932 1040,2931 895,2931 895,2932 870,2932 870,2931 869,2931 869,2930 868,2930 868,2929 734,2929 734,2930 728,2930 728,2929 683,2929 683,2928 681,2928 681,2927 680,2927 680,2926 544,2926 544,2925 542,2925 542,2924 364,2924 364,2937 363,2937 363,2938 359,2938 359,2937 357,2937 357,2936 356,2936 356,2935 355,2935 355,2934 332,2934 332,2933 331,2933 331,2932 330,2932 330,2931 329,2931 329,2929 328,2929 328,2918 327,2918 327,2917 326,2917 326,2915 325,2915 325,2913 313,2913 313,2912 312,2912 312,2911 306,2911 306,2910 305,2910 305,2909 304,2909 304,2906 303,2906 303,2888 302,2888 302,2887 301,2887 301,2784 302,2784 302,2783 303,2783 303,2782 304,2782 304,2781 337,2781 337,2780 338,2780 338,2779 345,2779 345,2773 346,2773 346,2772 347,2772 347,2771 370,2771 370,2646 365,2646 365,2645 363,2645 363,2644 361,2644 361,2643 360,2643 360,2642 359,2642 359,2640 358,2640 358,2622 334,2622 334,2621 332,2621 332,2620 331,2620 331,2619 330,2619 330,2618 309,2618 309,2617 308,2617 308,2616 307,2616 307,2614 306,2614 306,2613 305,2613 305,2591 301,2591 301,2590 300,2590 300,2589 299,2589 299,2585 300,2585 300,2584 301,2584 301,2583 303,2583 303,2482 304,2482 304,2481 305,2481 305,2480 306,2480 306,2401 307,2401 307,2397 308,2397 308,2296 307,2296 307,2294 306,2294 306,2291 305,2291 305,2290 304,2290 304,2285 305,2285 305,2185 304,2185 304,2181 305,2181 305,2180 307,2180 307,2179 311,2179 311,2178 353,2178 353,2175 354,2175 354,2174 356,2174 356,2173 357,2173 357,2172 358,2172 358,2171 360,2171 360,2170 362,2170 362,2169 365,2169 365,2168 368,2168 368,2167 371,2167 371,2081 372,2081 372,2079 373,2079 373,2078 379,2078 379,2079 526,2079 526,1967 523,1967 523,1966 522,1966 522,1964 521,1964 521,1960 522,1960 522,1959 523,1959 523,1958 524,1958 524,1957 527,1957 527,1956 529,1956 529,1955 567,1955 567,1954 568,1954 568,1953 569,1953 569,1951 570,1951 570,1950 571,1950 571,1949 573,1949 573,1948 576,1948 576,1947 578,1947 578,1946 582,1946 582,1945 583,1945 583,1944 586,1944 586,1943 590,1943 590,1944 592,1944 592,1945 594,1945 594,1946 596,1946 596,1954 612,1954 612,1955 618,1955 618,1956 748,1956 748,1952 749,1952 749,1950 837,1950 837,1872 838,1872 838,1871 839,1871 839,1870 840,1870 840,1869 841,1869 841,1867 842,1867 842,1866 843,1866 843,1865 845,1865 845,1864 847,1864 847,1863 869,1863 869,1858 870,1858 870,1857 905,1857 905,1855 906,1855 906,1854"/>
    <TextEquiv>
    <Unicode>                        প্রথমাধ্যায়.
         হস্তির আকার ও স্বভাবাদির বিবরণ.
  সকল পশুর মধ্যে হস্তির বিষয় লোকে নানা বিবেচনা
করিয়াছে, তাহার কারণ এই, যে হস্তি শক্তিমান ও তীক্ষ্ণ
ও শ্রমী, ও কোমলতাতে ও জ্ঞানে পরিপূর্ন হইয়া ও মনু-
ষ্যের এমত বশ্য হয়, যে মনুষ্য হস্তিকে যাহা ইচ্ছা করে
তাহাই করিতেপারে; ইহাতে হস্তির জন্মস্থান উষ্ণ দেশস্থ
মনুষ্যের অতিশয় উপকার.
   আসিয়া ও আফ্রিকার নিবিড় বনে বন হস্তী পাওয়া
যায়. তাহারা অনেকে একত্র হইয়া বাস করে, শাক ও বৃক্ষের
নূতন ডাল ও শস্য নানা জাতীয় ফল তাহারা আহার </Unicode></TextEquiv></TextRegion>
    <TextRegion id="r7" type="paragraph">
    <Coords points="961,2987 961,2988 989,2988 989,2989 993,2989 993,2990 995,2990 995,2994 994,2994 994,2996 993,2996 993,3012 992,3012 992,3014 991,3014 991,3015 989,3015 989,3016 985,3016 985,3017 984,3017 984,3028 983,3028 983,3029 980,3029 980,3028 977,3028 977,3027 975,3027 975,3026 974,3026 974,3025 973,3025 973,3024 971,3024 971,3023 970,3023 970,3022 968,3022 968,3021 965,3021 965,3020 963,3020 963,3019 961,3019 961,3018 959,3018 959,3017 958,3017 958,3016 956,3016 956,3015 954,3015 954,3014 953,3014 953,2995 952,2995 952,2994 951,2994 951,2990 952,2990 952,2988 953,2988 953,2987"/>
    <TextEquiv>
    <Unicode>ক</Unicode></TextEquiv></TextRegion></Page></PcGts>

This is the full set of files JHOVE could not validate due to that error:

REID2019/14125_c_2_1_0011.xml
REID2019/14127_d_15_4_0007.xml
REID2019/14127-e-19_vol1_Arabian-Nights_0003.xml
REID2019/14129.b. 12_0009.xml
REID2019/14129-c-3_table-of-content_0002.xml
REID2019/279_23_D_7_0002.xml
REID2019/279_2_A_28_0002.xml
REID2019/279_32_B_34_0002.xml
REID2019/279_32_B_34_0010.xml
REID2019/279_32_C_25_0004.xml
REID2019/T_11539__0023.xml
REID2019/VT_1397__0005.xml
REID2019/VT_1625_f_0003.xml
REID2019/vt1712_Schoolbook_0005.xml
REID2019/vt1712_Schoolbook_0009.xml
REID2019/VT_1752_b_0002.xml
REID2019/VT_1892_e_0003.xml
REID2019/VT_222_0004.xml

Some files contained therein have bidi character sets, which we first thought is the cause, but others have not, e.g. 14127-e-19_vol1_Arabian-Nights_0003.xml.

carlwilson commented 1 year ago

Hi @UmbrellaDish, unfortunately, this has just missed the 1.28 RC and so will have the be triaged and scheduled for the 1.30 release in Q4 2023. We will look to reproduce and get back to you on here in the next month or two once we know more.