harvard-lil / capstone

CAP database scripts.
MIT License
188 stars 44 forks source link

Add bounding rectangles from PDF to each paragraph in exported HTML #2151

Closed jcushman closed 1 year ago

jcushman commented 1 year ago

Add the data-blocks attribute to exported HTML, listing the (arbitrary assigned by vendor) block ID, PDF page order, and bounding box for each rectangle in the PDF that contributed to the paragraph. Example output:

<p id="b415-10" data-blocks='[["BL_415.10",415,[249,2172,1145,146]],["BL_416.1",416,[314,411,1150,756]]]'>Chief Justice Fuller in <em>United States v. Lacher ...

Bex will pick up stewarding this PR from here; once we're happy with it, we'd like to run fab refresh_case_body_cache on prod and see if all the volumes work.

codecov[bot] commented 1 year ago

Codecov Report

Merging #2151 (ff6b39c) into develop (8c692d3) will increase coverage by 0.01%. The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #2151      +/-   ##
===========================================
+ Coverage    63.19%   63.21%   +0.01%     
===========================================
  Files          106      106              
  Lines        11518    11524       +6     
===========================================
+ Hits          7279     7285       +6     
  Misses        4239     4239              

see 2 files with indirect coverage changes