papaemmelab / isabl_cli

🤖 isabl Command Line Client and SDK.
Other
3 stars 3 forks source link

Project Merge Analyses runs for each experiment when running `isabl process-finished` #46

Open nickp60 opened 1 year ago

nickp60 commented 1 year ago

Description

When running isabl process-finished to complete a group of analyses, the project-level merge logic is run after each is completed. This is very time consuming, and would probably be best run after the command is finished to avoid wasting resources.

What I Did

isabl process-finished -fi projects 20 Isabl runs the merge analysis after processing each analysis.

New feature

Perhaps a fix could be to disable trigger_analyses_merge when executing patch_instance from the process-finished command. Instead, the process-finished cmd could keep a list of analysis keys, individuals, and projects affected by the query and run those after updaing the status

# not actual code
analyses_processsed = {pks=[], indvs=[], projects=[]}
for analysis in analyses:
    patch_instance(status="SUCCEEDED", run_triggers=False)
    analyses_processsed["pks"].append(analysis.pk)
    analyses_processsed["indvs"].append(analysis.individual)
    analyses_processsed["projects"].extend(analysis.projects)  
for proj in set(analyses_processed["projects"]):
    run_project_merge_analyses(proj)
for indv in set(analyses_processed["indvss"]):
    run_individual_merge_analyses(indv)
juanesarango commented 3 weeks ago

@nickp60 I was long due to comment on this issue.

Project-level merge runs after each analysis is completed (FAILED or SUCCEEDED) And is only submitted when no other analyses on the project is pending to finish (SUBMITTED or STARTED), if there are pending running analyses the merge is skipped.

https://github.com/papaemmelab/isabl_cli/blob/f13b9956753ef595c0aa91884fbe011205614cde/isabl_cli/data.py#L135-L167

I believe what you're seeing is that it runs when FINISHED are available. We should add FINISHED to pending status: status__in="STARTED,SUBMITTED,FINISHED",