guardian / flexible-octopus-converter

An AWS Lambda for converting JSON from Octopus into Thrift.
1 stars 0 forks source link

Adjust filtering for body text #6

Closed jennygrahamjones closed 4 years ago

jennygrahamjones commented 4 years ago

What does this change?

This PR changes the way that we select which is the most important body text OctopusArticle in an OctopusBundle, and consequently which body text becomes part of the Thrift StoryBundle that is passed to Composer and/or Workflow. The rules we ought to follow (as outlined on the Trello card by @blishen and @hoyla):

This is how we pick which one we care about.

We identify "source" components as having a component type of Body Text, Panel Text, or Tabular Text

Discarding any information that might appear in square brackets afterwards. We take the list of "text" elements.

  1. If some are print and some are web/both, discard the print ones
  2. If we still have more than one and the types are mixed discard all tabular components
  3. If we still have more than one and the types are mixed discard all panel components
  4. If we still have more than one, pick the one with the lowest number.

The key changes to the ArticleFinder implementation are that: 1) we no longer discard articles that are marked for_publication in print 2) we establish that Panel Text is preferrable to Tabular Text

How can we measure success?

We should select the most appropriate body text article in any given situation.