portphp / portphp

Data import/export framework for PHP
https://portphp.readthedocs.io
MIT License
267 stars 46 forks source link

[RFC] About Workflow and Reader interactions #54

Open Vinorcola opened 7 years ago

Vinorcola commented 7 years ago

Hi,

Just discovered this lib and it looks promising. I've got a comment to make however on the way readers and workflow interact.

Right now, the reader must be built before the workflow since the workflow need the reader in its constructor. However, the reader is only used into the process() method. It would seems more logical to set the reader when calling the process() method.

One of the most interesting benefits would be the ability to use the same workflow object to handle several data sources. Let's say I have to import a CSV and an Excel file at once. I could build a workflow object (setting up validators, converters, etc.), and then process the 2 readers by calling twice the process() method.

$workflow = new StepAggregator();
// configure workflow here

$excelReader = new ExcelReader(/* ... */);
$csvReader = new CsvReader(/* ... */);

$workflow->process($csvReader);
$workflow->process($excelReader);

To keep BC, StepAggregator constructor could accept nullable reader, and process method could also accept nullable reader. Reader passed to process method has priority over reader passed to constructor. And if no reader it passed at all, an exception is thrown.

ddeboer commented 7 years ago

I like the idea and the BC-thinking! It makes the Workflow better re-usable, and easier to turn into a service.

Would you like to open a PR?

Vinorcola commented 7 years ago

OK, I'll create the pull request.

Vinorcola commented 7 years ago

Humm, after thinking a bit on the issue, it looks like the problem is bigger:

Processing a workflow does prepare() the writers and finish() them. However, if you want to run several readers through the workflow:

  1. Should the writers be prepare()d and finish()ed again?
  2. What if one add a writer to the workflow between 2 process()?
  3. What if writer is updated between 2 process() (e.g. Change Excel sheet)?

Several solutions I can think for now:

  1. Having a ConcatReader that take several readers and concat the data (doesn't solve 3.)
  2. Make the workflow being a step, so we can create a workflow of workflow (this as to be thought deeper regarding the concept of workflow).