localgovdrupal / localgov_publications_importer

PDF importer for LocalGov Drupal publications module.
3 stars 0 forks source link

Define the plugins we'll need and the data structures that we'll pass between them. #5

Open rupertj opened 1 week ago

rupertj commented 1 week ago

We want to make each step in the process customisable via plugins:

We'll need a representation of the PDF being imported that we'll pass between these. (Currently, the prototype just uses what it gets back from smalot/pdfparser, but we don't want to be attached to that.)

rupertj commented 1 week ago

Let's call the thing being passed between the plugins an "Import" for the moment. The Import interface will need:

  1. The original file that was uploaded.
  2. A title for the whole document.
  3. Pages. As each page will have content and a title, we'll need an ImportPage type to store that.

And that's it for the current functionality. In time, we might want Imports to be a content entity, so we can persist them. Then we can move processing to cron, upload them in big batches, etc.