Closed maxaalexeeva closed 3 years ago
Merging #263 (da8e534) into master (25f6ca1) will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## master #263 +/- ##
=======================================
Coverage 66.84% 66.84%
=======================================
Files 93 93
Lines 17674 17674
=======================================
Hits 11814 11814
Misses 5860 5860
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 25f6ca1...da8e534. Read the comment docs.
Thanks for the heads-up, @dpdicken . Let's discuss this early next week so I understand how to adapt to it.
Thank you @maxaalexeeva for addressing this issue. This looks great to me. I'll make sure to read through the codes so that I understand how this works. Thanks!
@dpdicken thanks for the quick review! I think I will want to merge this because this change can be helpful for @alicekwak, but I am also wondering if @cl4yton and I should discuss if this is the output we want from me before you, @dpdicken and @cl4yton, proceed with the upstream changes. @cl4yton, we can wait with the discussion till our next meeting or try to meet earlier, before your next meeting with @dpdicken.
Hi @maxaalexeeva , I will try to catch you tomorrow (Fri) to see if we can chat briefly about this. Thanks!
Combining cosmos blocks that are likely parts of the same paragraph. Two consecutive blocks are combined if:
Removing "- " to address the issue of words being split at new lines (e.g., two additional parameters enter through the representa- tion of final consumption)
More info on combining blocks:
More than two blocks can end up being combined.
The location of the mention in the pdf is now a sequence of page ids and a sequence of block ids instead of just Ints. The block and page ids align in the two sequences, that is p1 and b1 are page and block ids of the first combined block, p2 and b2 are page and block ids of the second combined block, etc:
page: [p1, p2, p3] block: [b1, b2, b3]
The contents of the blocks are concatenated into one string. The character offsets of the mention and mention arguments are given with respect to the concatenated string from all the combined blocks.
Resulting output from align:
Attn @dpdicken: the change in the align output may impact some downstream scripts.