Open myrmoteras opened 3 years ago
is there a reason, that almost of hypenhanted word had to be corrected in the QC?
"Tools > Check Text Flow Breaks" might offer some help here, especially if there were problems in the page structure right after decoding. Otherwise hard to tell ... there might well have been respective errors in earlier documents as well, but we've only been able to check for this type of error since the generalization of the QC infrastructure this summer/fall.
in this article, I can't get the table to mark. Is there anything we can do about it?
Yes, we've had the (rotated) sub window editing since September ... just click in the page edge, select "Edit Page in Sub Window", and select "90° Clockwise Rotation". Then, you can mark the table in the rotated sub window, and it writes back to the main window after closing the rotated dialog with "OK" ... see also https://github.com/plazi/ggi/issues/102
In the upright page orientation, the columns (actually the rows) are just too dense for the "Mark Table" macro to find any viable column splits.
In general, Table 2 is a bit of a nightmare, though ... the portion on Page 6 (number 28) actually has two parts, the bottom one continuing on Page 7 (number 29), and those two parts together continue on the right of the top part on Page 6 ... This constitutes a pretty tricky layout, as tiling table together right now only works if the tiles exhibit some regularity, in particular one tile on top of exactly one tile, and/or one tile to the left of exactly one tile, as otherwise the logic for overall assembly becomes prohibitively complex ... and here we have two tiles in top-bottom arrangement to the right of one tile, which we cannot resolve into an overall table at this point ...
Not sure whether or not more complex logic capable of handling the above makes sense to pursue, as such somewhat asymetric cases seem to be extremely rare at this point, and you can still copy the individual tables and patch them together in Excel.
is there anything that could be done with these consecutive images?
Multipart images are in the works ... still have to figure out cases where some of the tiles are rotated while others are not, but in general the next build should be able to connect multipart images and handle them as one.
is there a reason, that almost of hypenhanted word had to be corrected in the QC?
"Tools > Check Text Flow Breaks" might offer some help here, especially if there were problems in the page structure right after decoding. Otherwise hard to tell ... there might well have been respective errors in earlier documents as well, but we've only been able to check for this type of error since the generalization of the QC infrastructure this summer/fall.
this I processed today, so it should not have something to do with the old QC infrastructure. Will try out the tool.
Could it have something to do with decoding that the "-" has been decoded using a different symbol that is not considered a hyphenation?
this I processed today, so it should not have something to do with the old QC infrastructure. Will try out the tool.
Sure it does not have anything to do with the old infrastructure ... was merely explaining why we've been seeing hyphenation errors only rather recently, namely because the old infrastructure did not report them.
Could it have something to do with decoding that the "-" has been decoded using a different symbol that is not considered a hyphenation?
That's very likely, yes ... what was the dash/hyphen decoded to? There is a good bit of normalization going on in this regard already, but there can always be some Unicode point that's still lacking on the list of possible hyphens ...
in this article, I can't get the table to mark. Is there anything we can do about it?
I get an error message that is can't create proper html
is there anything that could be done with these consequtive images?
is there a reason, that almost of hypenhanted word had to be corrected in the QC?