verygoodsecurity / starlarky

VGS edition of Google's safe and hermetically sealed Starlark language - a non-Turing complete subset of Python 3.
https://vgs.dev
Apache License 2.0
28 stars 38 forks source link

Proposed Parallel Processing Interface #47

Open mjallday opened 3 years ago

mjallday commented 3 years ago

this issue is discussing how to expose an interface in larky for parallel processing of data. currently larky is single threaded but many files for batch processing lend themselves to parallelism.

interface for multiprocessing.map would be something like multiprocessing.map(iterator, transformer) where transformer would be a lambda that takes each element along with the ctx and return the output of the transform.

operations:
- Script:
  lang: starlarky
  script: |
    load(@vgs/multiprocessing, 'multiprocessing')
    def process(input, ctx):
      result = '\n'.join(multiprocessing.map(input.split('\n'), lambda x, ctx: vault.put(x[1]))
      return result, ctx

assume input is a stream like object for sftp files or http object for http requests.

multiprocessing.map would be some interface to some execution framework such as spark which would execute the lambda and use the number of processes that customer has provisioned.

mjallday commented 3 years ago

AB33AA52-2707-42B3-935D-C20C80865E2C

here’s an example of a similar interface using Apache beam.

mjallday commented 3 years ago

https://docs.dask.org/en/latest/futures.html Here’s an example from Dask 29E0E9D8-C127-4561-B36A-F97204C74C05