Review of YAMTL solution

georghinkel commented 3 years ago

First, sorry for being late with this review. I was on a parental leave and unexpectedly did not have an internet connection.

I was not able to run the solution, yet. Will do so hopefully during today, though I actually expect everything to work on that end.

The solution is generally easily understandable and very performant. I especially like the idea of the toMany rules with the ability to trace the correct element from other rules.

The only thing, I do not understand the transformation of the next/prev pointers. To me, it looks like the transformation is creating next/prev links for all transformed low-level jobs, not just the ones that operate on the same samples.

arturboronat commented 3 years ago

Congratulations for the newborn! And thanks for taking the time to write comments.

Rule job shows how to set previous jobs in the low level model:

rule('job').isAbstract.toMany
    .in('in_step', LAB.protocolStep)
    .out('out_job', JOB.job) [
        out_job.protocolStepName = in_step.id
        // set container    
        val in_jobRequest = in_step.eContainer.eContainer as JobRequest
        val out_jobCollection = in_jobRequest.fetch('out_jobCollection', 'jobRequest_->_jobCollection') as JobCollection
        out_jobCollection.jobs.add(out_job)

        if (in_step.previous !== null)
            out_job.previous.add(in_step.previous.fetch() as Job)   
    ],

Specifically, in_step.previous.fetch() will fetch the job that corresponds to the previous protocol step so that jobs preserve the order specified among steps. I assumed that the jobs that were created for the same protocol step can be performed concurrently and these may share the same previous/next jobs.

Listing 7 in the paper refers to old code. During the experimentation with rules toMany the fetch operator could return a list of objects (corresponding to output objects of all matches) by default but returning the output object of the first match, when the match occurrence is not specified, seems more intuitive.

fjouault commented 3 years ago

This declarative YAMTL solution is written in an embedded xtend DSL and achieves a relatively good scalability.

The fact that xtend is imperative slightly increases verbosity compared to what could be achieved with an external DSL. However, an advantage is that xtend is directly usable.

When running the solution, we observed that it crashed with an exception on some models from the new_samples set. We also noticed that it did not save target models, but we managed to modify the code to make it do so.

We noticed the following issues in the result models. These were not detected by the NMF-based validation program.

Major issues:
- Chunking for liquid transfer jobs seems to be only performed at the microplate level: all tips are in one job instead of being performed column per column.
- The source of sample distribution jobs seems to be always the same tube runner (probably related to the previous issue).
- It seems that the tips that are removed after a failed job are not the right ones: their index is the expected index + 1.
- Failed samples are not removed from wash jobs.
Minor issues:
- Tip statuses do not seem to be updated after changes are applied.
Remarks:
- Jobs are reordered wrt. steps in the source model, but next and previous seem fine.

Minor comments on the paper:

“root” rule in paper is named “jobRequest_->_jobCollection” in code
Section 2:
- Paragraph 4: “according to the expression jobRequest.samples.size / TUBE_RUNNER_CAPACITY”
  - The expression is “max(jobRequest.samples.size, TUBE_RUNNER_CAPACITY)” in listing and code, is this a typo?
- Paragraphe 5: possibly the same typo?
- Rule “tipCreation” in code and listing, “tip_creation” in text and caption

This review was written in collaboration with @TheoLeCalvar.

arturboronat commented 3 years ago

Many thanks @fjouault and @TheoLeCalvar, the solution is crashing when new samples are added indeed. I need to look into these issues.

georghinkel commented 3 years ago

@arturboronat Actually, the previous/next references are a bit more sophisticated: A low-level job is a next job of another if the original high-level protocol step is next of the other and both jobs share at least one sample they are operating on.

That is, if a low-level job was responsible to distribute samples 1 and 2, then the add reagent to samples 3 and 4 is not a next job, even if the add reagent step was the next after the distribute. In reality, this sharpness is critical because time constraints cannot be met otherwise.

arturboronat commented 3 years ago

Thanks all for your comments.

I have uploaded a fix to the solution where the main modifications are:

Rules toMany must contain a toManyCap expression returning an integer that indicates the number of repetitions to be inserted. When rules toMany participate in a inheritance hierarchy, only concrete rules need to have such expression. At present, toManyCap expressions cannot be inherited. This separation of concerns (between filter and toManyCap expressions) allows YAMTL to distinguish whether a rule match is valid from the number of times it needs to be repeated, in case an update affects the toManyCap expression. This also means that repetition is not part of the matching algorithm. In the previous version, each occurrence of a match involved matching a rule. With toManyCap the rule is matched once and repetitions are handled independently of matching. I will show the impact on performance tomorrow at the workshop.
Removed the change specification. To my surprise this has not affected performance negatively - but for one scenario.
Priorities for rules are not needed anymore: the transformation is more declarative.

Regarding major issues:

Chunking for liquid transfer jobs seems to be only performed at the microplate level: all tips are in one job instead of being performed column per column.
- This should be fixed now.
The source of sample distribution jobs seems to be always the same tube runner (probably related to the previous issue).
- This should be fixed now.
It seems that the tips that are removed after a failed job are not the right ones: their index is the expected index + 1.
- Note that positions of jobs start at 0 whereas names start at 1.
Failed samples are not removed from wash jobs.
- Yes, this is a point to be improved. I'll comment on this tomorrow.
Previous and next references preserve the order but this may not be optimized. It would be ideal to have have a systematic check for this.

I have also included a docker image.

The commit is here.

See you all tomorrow.

tecan / ttc21incrementalLabWorkflows

Review of YAMTL solution #7