snivilised / pixa

Directory tree based bulk image processor (Also serves as a working example of how to use extendio, cobrass and arcadia)
MIT License
0 stars 0 forks source link

design mini navigation framework #9

Closed plastikfan closed 8 months ago

plastikfan commented 11 months ago

The framework needs to allow it to be used across multiple commands and then be made generic enough to be used by other applications. Eventually, this framework will be implemented in either/and arcadia and cobrass, depending on relevant pieces of functionality. Actially, a clear separation should be established between the functionality that is arcadia specific and that which is more generc than that and should be implemented in cobrass.

Legend:

Need the following entities:

It should be easy to switch a command to use the simple or scheduled profiles. Eg, a visit command which will defined to use the simple profile, should be able to be switched over to the scheduled profile simply by setting a property.

Configuring the navigation and then setting up the appropriate workflow are 2 separate phases. A workflow step will need to execute the navigator, therefore the workflow infrastructure has a dependency on the navigator, but not the other way around.

plastikfan commented 11 months ago

This is over engineered. We don't need a workflow, so get rid of that part. The execution profile can remain with sync and async versions. Calling the execution profiles both block and are transparent to the client. Under the covers, the async version will orchestrate all the synchronisation primitives required to execute navigation concurrently, whilst the sync version will simply run in a blocking manner. (See: How to use a Mutex to define critical sections of code and fix race conditions)

We need to handle ui updates accordingly and separately from the core navigation activity. The cli will need to display what has executed in the ui. We also need to ensure we capture the output of the external process and report that as appropriate.

For async version. any writes to the ui should be done atomically to ensure the output remains coherent. So for a single task, any writes for that task must be done in a single hit to prevent output from differing tasks from interleaving with each other. We can achieve this by using a buffer and then flushing it in a single hit when the individual task has completed its work. We need the Golang equivalent of a critical section in order to write to console in a safe manor.

plastikfan commented 11 months ago

There is going to be a limited version of the asynchronous model that can be implemented externally to extendio navigation. I've realised that the only concurrent model that can be defined is with the folders with files subscription type. So for a single notification (ie the invocation of the client callback), we can spawn a go routine for each child file of the current folder traverse item. We define a unit of work that will handle the processing of a single file and keep spawning a go routine for each unit of work. This can be achieved using the command/unit-of-work pattern. We implement thru put throttling by spawning no more than n no of go routines to maximise CPU usage. We also use a journal style technique where we pre write the work that is to be completed the the form of a temporary file. This way, if the process is terminated part way thru the batch, the user can invoke a resume at which point we check which target files already exist and therefore do not need to be recomputed. We also check which journal files exist which defines which files were being processed at the time of the last interrupt. From this analysis, we can reconstruct the remaining workload for this folder and continue where we left off from.

So we define a workload first, which is then decomposed into a stream of units of work.

What needs to be researched is the worker pool pattern, eg: go currency worker pool pattern and associated github project: workers-pool

plastikfan commented 11 months ago

Remember to separate actual work from ui updates. This way, the executable functionality can be reused easily. This generic functionality will either end up in extendio or Arcadia to aid reuse.

plastikfan commented 11 months ago

The other model for concurrency during directory traversal would need to be built directly into the extendio navigator. The navigator would probably have to return a channel of traverse result, rather than the traverse result itself. When the go routine is complete it signals it's completion by sending the result back thru the channel. The navigator will keep track of outstanding requests via the channels and continue navigation whilst there are still available slots. What we mean by available slots is that if we say we want to allow n concurrent go routines, we keep spawning until we reach n. As we get notified via their associated channels we can dispatch another unit of work. A point of complexity would be how to implement the fast forward feature. We can handle this by performing the fast forward phase single threaded. Then from this trigger point we can then proceed on a concurrent basis.