ml4ai / automates

AutoMATES: Automated Model Assembly from Text, Equations, and Software
https://ml4ai.github.io/automates
Other
24 stars 9 forks source link

Combining cosmos blocks #263

Closed maxaalexeeva closed 3 years ago

maxaalexeeva commented 3 years ago

More info on combining blocks:

More than two blocks can end up being combined.

The location of the mention in the pdf is now a sequence of page ids and a sequence of block ids instead of just Ints. The block and page ids align in the two sequences, that is p1 and b1 are page and block ids of the first combined block, p2 and b2 are page and block ids of the second combined block, etc:

page: [p1, p2, p3] block: [b1, b2, b3]

The contents of the blocks are concatenated into one string. The character offsets of the mention and mention arguments are given with respect to the concatenated string from all the combined blocks.

Resulting output from align:

{
      "uid": "d1cedcc9-8920-4815-9f98-d0ef8d400a4c",
      "source": "2021.06.30.21259782v1.full.pdf",
      "original_sentence": "Using the fixed parameters in Table .9 and a contact rate of beta 1 = 0.24 to account for transmission in the absence of nonpharmaceutical control measures ( face mask usage , social distancing policies ) , the herd immunity threshold for the wild-type ( strain 1 ) SARS-CoV-2 strain is 0.61 for the Pfizer or Moderna vaccines ( epsilon v = 0.94 ) , and 0.86 using the Johnson & Johnson vaccine ( epsilon v = 0.67 ) .",
      "content": "0.67",
      "spans": {
        "page": [11, 11],
        "block": [5, 6],
        "spans": [{
          "char_begin": 376,
          "char_end": 385
        }]
      },
      "arguments": [{
        "name": "identifier",
        "text": "εv",
        "spans": {
          "page": [11, 11],
          "block": [5, 6],
          "spans": [{
            "char_begin": 376,
            "char_end": 378
          }]
        }
      }, {
        "name": "parameter_setting",
        "text": "0.67",
        "spans": {
          "page": [11, 11],
          "block": [5, 6],
          "spans": [{
            "char_begin": 381,
            "char_end": 385
          }]
        }
      }]
    }

Attn @dpdicken: the change in the align output may impact some downstream scripts.

codecov[bot] commented 3 years ago

Codecov Report

Merging #263 (da8e534) into master (25f6ca1) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #263   +/-   ##
=======================================
  Coverage   66.84%   66.84%           
=======================================
  Files          93       93           
  Lines       17674    17674           
=======================================
  Hits        11814    11814           
  Misses       5860     5860           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 25f6ca1...da8e534. Read the comment docs.

cl4yton commented 3 years ago

Thanks for the heads-up, @dpdicken . Let's discuss this early next week so I understand how to adapt to it.

alicekwak commented 3 years ago

Thank you @maxaalexeeva for addressing this issue. This looks great to me. I'll make sure to read through the codes so that I understand how this works. Thanks!

maxaalexeeva commented 3 years ago

@dpdicken thanks for the quick review! I think I will want to merge this because this change can be helpful for @alicekwak, but I am also wondering if @cl4yton and I should discuss if this is the output we want from me before you, @dpdicken and @cl4yton, proceed with the upstream changes. @cl4yton, we can wait with the discussion till our next meeting or try to meet earlier, before your next meeting with @dpdicken.

cl4yton commented 3 years ago

Hi @maxaalexeeva , I will try to catch you tomorrow (Fri) to see if we can chat briefly about this. Thanks!