monarch-initiative / pheval.exomiser

Exomiser plugin for PhEval.
4 stars 0 forks source link

pheval-exomiser prepare-exomiser-batch command bugs #61

Open AO33 opened 1 month ago

AO33 commented 1 month ago

prepare-exomiser-batch bugs

Ive created a local branch that fixes or implements the features listed below





yaseminbridges commented 1 month ago

Hi @AO33!

I just have a few comments/questions 🙂

poetry install failure Error

  • Was an issue regarding lxml (4.9.2) not supporting PEP 517 builds.
  • Changed the poetry.lock file to the latest version of lxml 5.2.2 and reran poetry install to get around

I believe it is generally discouraged to manually change the poetry.lock as this can break the reproducibility. Perhaps another workaround can be to update the dependencies. I have had this error before but after fixing my poetry installation I've not encountered it.

  • pheval-exomiser prepare-exomiser-batch Error

    • TypeError: prepare_exomiser_batch() got an unexpected keyword argument 'results_dir'
    • The prepare_exomiser_batch and create_batch_files aren't quite synced up with input arguments
    • Inconsistency with the @click options that are made available

Sounds good!

  • pheval-exomiser prepare-exomiser-batch Bug

    • phenotype-only setting is getting written to batch files even when vcf / analysis mode is desired.
    • The CommandsWriter.write_local_commands will only write analysis commands if variant_analysis (phenotype_only) is set to True. This can never be the case because it will always be set to False because of the vcf_dir and command options that are set.
    • I replaced --phenotype-only with --variant-analysis for tool input

Sounds good! I had removed the phenotype_only a while ago so using --variant analysis makes perfect sense.

  • pheval-exomiser prepare-exomiser-batch slight refactor

    • Originally output number_of_samples/max_jobs number of batch files
    • Refactored the BatchFileWriter.create_split_batch_files files function so that max_jobs argument outputs max_jobs number of batch files instead (User specified number of jobs without having to perform manual calculation beforehand)

I am a bit confused by this - can you explain this a bit more? Has this changed so that max_jobs is the number of files rather than the number of samples? If so I think I would prefer to keep it as the number of samples ran in a single job.

  • pheval-exomiser post_process_results Added --from-batch-file optional option

    • --from-batch-file option allows user to pass in file/path/to/exomiser_batch_commands.txt file
    • This allows the user to only post process results from samples found in the input batch_commands.txt argument (So if multiple jobs are running they won't try and process incomplete or same results files)

What is the use case of this (just so I can understand a bit better)? What we have set out in PhEval is that a each job will have it's own output directory so this type of thing shouldn't happen