robindemourat commented 7 years ago

Currently storing-expansive data such as base64-encoded image and tables' data is stored directly in their related objects (e.g. story metadata for covers, resources for table data, resources or contextualizers for base64-encoded images).

This initial choice is relevant when goal number 1 is sustainability and robustness of peritext stories files for use cases such as articles or students work. This is less realistic for handling data-intensive or image-intensive book-length stories.

It should be possible to handle data differently. Ideas :

storing these expansive fields values as refs resolved else where (an assets field at story's level ?) so that the question of their retrieval is separated from the story contents themselves (and can be loaded without them for instance)
store a url instead of direct data (or - more advanced - store some http query params to access an api and retrieve the data)
store a reference to a local file being stored elsewhere

robindemourat commented 7 years ago

Best solution so far:

{
  resources: {
    // ... resources
    'uuid': {
      type: 'image',
      dataset: 'uuid'
    }
  }
  // ...story
  datasets: {
    'uuid': {
      uri: '',
      format: 'base64', // or 'json', 'xml', 'csv' ...
      method: 'raw', // or 'get', 'put', ...
      options: {} // depending on the method
      data: 'lkjmlkj' // if available without query 'raw'
    }
  }
}

robindemourat commented 6 years ago

Reflections on implementation

There are two stakes for implementation here :

lighten the weight of a story object so that it is compatible with long-form, data-intensive works.
lighten the loading speed of pages at implementation by allowing a progressive loading of heavy assets such as images or datasets.

Find a way to mark dataset-related fields in contextualizations and resources so that they can be parsed easily by a data fetcher when assembling/archiving the story (resolve all datasets by downloading their content locally > find all datasets mentions to modify them with proper data path/url).

Idea : represent all dataset-related fields with an object containing a "dataset" field.

{
  // ...resource
  thumbnail: {
    dataset: 'lkjmlj-dfglkmj-3343'
 }
}

Or manage it with a regex strategy:

{
  // ...resource
  thumbnailDataset: 'lmkjml-mlkj-mlkj'
}

Question: should the dataset info be resolved upstream of contextualizers, or passed through context ?

Contextualizers concerned

[ ] image
[ ] video (thumbnail)
[ ] dicto (thumbnail)
[ ] webpage (thumbnail)
[ ] vega (thumbnail)
[ ] p5 (thumbnail)
[ ] embed (thumbnail)
[ ] data-presentation (thumbnail)

peritext / peritext-core

handle data-related fields in a robust way #6

Reflections on implementation

Contextualizers concerned