scramjetorg / scramjet

Public tracker for Scramjet Cloud Platform, a platform that bring data from many environments together.
https://www.scramjet.org
MIT License
253 stars 20 forks source link

Question: JSON Array File - Process records in parallel batches. #34

Closed paulmowat closed 5 years ago

paulmowat commented 5 years ago

Hi,

Looking for a bit of advice. I have a file that contains a JSON array of data. This is held on AWS S3.

Format similar to [{"name": "paul"}, {"name": "bob"}, {"name": "stuart"}] but much more data/properties.

I want to stream that data and process a number of records in parallel. e.g. 100 at a time.

I've tried the below

 return DataStream
      .pipeline(
        stream,
        JSONStream.parse('*')
      )
      .setOptions({
        'maxParallel': 100
      })
      .each(async (record) => {
        return importer(record)
      })
      .run()
      .then(() => {
        console.log('Processed entire file')
      })
      .catch((err) => {
        console.error(err)
      })

I want the importer routine to run which will run through a lot of validation/processing before finally updating the database.

I need to do this in parallel for performance/efficiency eg. 100 at a time, and as soon as one is finished start processing the next available until all finished but never increasing above that 100 limit.

When all have finished processing the processed entire file message should be displayed.

Tried a few things but can't quite get it to work. Any help is much appreciated.

Thanks

Paul

MichalCz commented 5 years ago

For one, I think you might be interested in the "unorder" method, added in a recent version. This will keep 100 executions running, but won't take order into account.

This doesn't explain the lack of the final message. I think that there may be a case of an unrejected promise on error. Try adding a console.log in another each or do, after your current each call and see what was the last promise. If you change the code to use unorder, you'll also see some items missing - this may be a good lead.

paulmowat commented 5 years ago

thanks for the quick reply, will go give that a try.

MichalCz commented 5 years ago

@paulmowat any new info on this?

paulmowat commented 5 years ago

Hi

I got it working with the unorder method as you mentioned :)

Thanks

Paul

MichalCz commented 5 years ago

Great to hear. :)