are.na filtering and validation lib

g-a-v-i-n commented 6 years ago

macarena and arentv heavily rely on block type filtering and url validation/sanitization. assuming tools in the toolkit will as well, I'm starting to write a general purpose 'library' to make this less of a headache.

here's an example config:

const config = {
  whitelist: {
    source: ['youtube', 'vimeo', 'upload', 'soundcloud'],
    fileType: ['mp3', 'flac', 'wav'],
    blockType: ['attachment', 'media'],
  },
  validation: {
    internalValidators: {
      isValidHref: true,
      HTTPSonly: false,

    }
    externalValidators: {
      reactPlayer: (item) => reactPlayer.canPlay(item),
      imageIntegrity: (item) => validateImageIntegrity(item),
    }
  },
  sanitization: {
    forceHttps: false, 
    fillTitle: true,
  }
}

the pipeline goes something like this. ideally this is lazy and only runs validation / regex etc on items that cannot be easily rejected according to blockType

get channel contents |> 
does blockType pass? append with message |> 
getURL |>
decide if the URL is valid and sanitize, append with message|>
any other external validators (reactPlayer), append with message |>
superficial sanitization (fill untitled blocks etc |> 
return copy of contents with messages

i propose it is appropriate to append messages about this process to the block itself in the following fashion:

{
  ...contents,
  validation: {
    source: {
      pass: true,
      isOfType: 'YOUTUBE',
      message: '',
    },
    fileType: {
      pass: false,
      isOfType: 'NO_FILE'
      message: '',
    },
    blocktype: {
      pass: true,
      isOfType: 'MEDIA'
      message: '',
    },
// external validators ( idk ? )
    reactPlayer: {
      pass: true,
      isOfType: 'CANPLAY'
      message: '',
    }
  }
}

g-a-v-i-n commented 6 years ago

The final Q is – should this be a node module?

hxrts commented 6 years ago

So great. Within the Toolkit, I was thinking modules like this could be loaded once, and then provide a global interface for other tools. As a node module, this model would imply one import, rather than a separate import for each tool. I'm open to other strategies, but I think this will reduce page weight and complexity in the long run.

hxrts commented 6 years ago

One piece of related functionality that's also worth mentioning is a "block fetch" module. Once block data has been validated and the block type determined, it would be quite useful to retrieve specific data from those objects.

Examples

mp3:  parse ID3 metadata & return non-empty fields
jpeg: parse embedded metadata + determine file size
http: navigate to url, return <title>, attempt return of body text
pdf:  title / author / number of pages

Presumably this would be a separate module, unless you're trying to make a swiss army knife.

g-a-v-i-n commented 6 years ago

single load sounds good to me. each validator should just be plug and play, so each tool can provide it's own custom validation methods if necessary.

and yes, i think this is where the tinyAPI parser from mac.are.na would be a good fit

g-a-v-i-n commented 6 years ago

https://github.com/gavinpatkinson/validate-arena heres a first pass

g-a-v-i-n commented 6 years ago

ok after building /prototyping this a little i have a thought about how general this can/should/wants to be:

rn we have a config as follows. Note the perscripted block attributes.

const validatorConfig = {
    whitelists: {
      class: ['Attachement', 'Media'],
      providerName: ['YouTube', 'Vimeo', 'SoundCloud'],
      extension: ['mp3', 'flac', 'wav'],
      state: ['available'],
    },
    sanitizers: {
      cleanURL: block => cleanURL(block),
      fillTitle: block => fillTitle(block.title),
    },
    validators: {
      reactPlayerValidator: block => reactPlayerValidator(block),
    },
}

BUT what we could do instead is:

const validatorConfig = {
    whitelists: {
      block.class: ['Attachement', 'Media'],
      block.source.provider.name: ['YouTube', 'Vimeo', 'SoundCloud'],
      block.attachment.extension: ['mp3', 'flac', 'wav'],
      block.state: ['available'],
      any.sharedKey: ['something'],
      channel.whatever.something.heck: ['yadda'],
    },
    sanitizers: {
      cleanURL: block => cleanURL(block),
      fillTitle: block => fillTitle(block.title),
    },
    validators: {
      reactPlayerValidator: block => reactPlayerValidator(block),
    },
}

this way the lib becomes more of a general purpose object validator which is p cool.

The other q is: Should each process - ie whitelist, sanitization, validation be separate methods?

new-computers / arena-toolkit

are.na filtering and validation lib #7

Examples