editorialbot commented 1 year ago

Submitting author: !--author-handle-->@ManavalanG@lpantano<!--end-editor-- Reviewers: @brentp, @Redmar-van-den-Berg Archive: 10.5281/zenodo.10002036

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/29354a9f9b0e02ec16fdc08c7c7648a6"><img src="https://joss.theoj.org/papers/29354a9f9b0e02ec16fdc08c7c7648a6/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/29354a9f9b0e02ec16fdc08c7c7648a6/status.svg)](https://joss.theoj.org/papers/29354a9f9b0e02ec16fdc08c7c7648a6)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@brentp & @Redmar-van-den-Berg, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @lpantano know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @brentp

📝 Checklist for @Redmar-van-den-Berg

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 1 year ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.11 s (526.9 files/s, 47955.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
YAML                            21             71             29           1213
Python                          14            318            306           1183
Markdown                        17            324              0            874
Jinja Template                   1             19              0            591
TeX                              1             24              0            208
Bourne Shell                     1             17             10             39
JSON                             1              0              0             23
INI                              1              5              0             17
TOML                             1              1              0              7
-------------------------------------------------------------------------------
SUM:                            58            779            345           4155
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 1 year ago

Wordcount for paper.md is 1877

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1093/bioinformatics/btw354 is OK
- 10.1038/s41436-020-01084-8 is OK
- 10.1038/s41525-020-00154-9 is OK
- 10.1093/bioinformatics/btv566 is OK
- 10.1093/bioinformatics/btx699 is OK
- 10.1093/gigascience/gix090 is OK
- 10.1093/gigascience/giab008 is OK
- 10.1101/gr.246934.118 is OK
- 10.1186/s13073-020-00761-2 is OK
- 10.12688/f1000research.15931.2 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

lpantano commented 1 year ago

From @Redmar-van-den-Berg: I'm trying to follow along with https://quac.readthedocs.io/en/stable/system_testing/, and I've run into 2 issues

The pipeline does not create the output and log folders automatically, so the user has to create the folders manually before the pipeline is started
There is no automatic way to fetch the required reference files that are described in configs/workflow.yaml

ManavalanG commented 1 year ago

@Redmar-van-den-Berg Thanks for accepting to review our pipeline and providing us an initial feedback.

The pipeline does not create the output and log folders automatically, so the user has to create the folders manually before the pipeline is started

We chose this setup deliberately as this acts as a sanity check for us to be certain that the output and log directories were chosen correctly by the user.

There is no automatic way to fetch the required reference files that are described in configs/workflow.yaml

Good point. I have now automated retrieval and processing of the necessary datasets. Docs are updated to reflect this change, and workflow-config is now updated to utilize the datasets downloaded using above script.

PS- Please be sure to use latest version of the docs. This can be chosen by clicking version button on the bottom right in the docs website.

Redmar-van-den-Berg commented 1 year ago

We chose this setup deliberately as this acts as a sanity check for us to be certain that the output and log directories were chosen correctly by the user.

I don't think this adds a lot of safety, and it does make it more difficult to run the pipeline.

Good point. I have now automated retrieval and processing of the necessary datasets. Docs are updated to reflect this change, and workflow-config is now updated to utilize the datasets downloaded using above script.

The .test/setup_test_datasets.sh does not run on my system. It uses the module system, which is not available in the quac conda environment. It also refers to /data/project/worthey_lab/samples/NA12878, which will not exist on most users systems. Would it be possible to include a small example bam file in the repository for testing purposes? That way, QuaC could also use automated tests on github, which is one of the recommendations from JOSS.

ManavalanG commented 1 year ago

The .test/setup_test_datasets.sh does not run on my system.

We make output of this script readily available in the repo. Please see .test/ngs-data/test_project/analysis for the output. This includes bam, vcf, etc. .test/setup_test_datasets.sh was included to show the steps taken to create the files used in system testing, and we do not expect most users to execute the script :)

ManavalanG commented 1 year ago

Please note that, once the requirements and configs listed here are satisfied, one should be able to run system testing without any further setup. Please let me know if you run into any issues :)

brentp commented 1 year ago

Review checklist for @brentp

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/uab-cgds-worthey/quac?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@ManavalanG) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Redmar-van-den-Berg commented 1 year ago

Please note that, once the requirements and configs listed here are satisfied, one should be able to run system testing without any further setup. Please let me know if you run into any issues :)

Maybe we are talking about two different things. The file configs/workflow.yaml refers to a number of reference files, but it is not clear how to get them. Ideally, QuaC would include either a reduced test version of these files, or a script to fetch them automatically.

ManavalanG commented 1 year ago

Ideally, QuaC would include either a reduced test version of these files, or a script to fetch them automatically.

QuaC includes script src/setup_dependency_datasets.sh to fetch all the datasets listed under datasets in configs/workflow.yaml. If those are not the files you were referring to, please let me know.

I cleaned up the docs a bit for clarity on this topic. I hope this helps.

Redmar-van-den-Berg commented 1 year ago

Thanks for pointing me to the script, I was able to download the required reference files!

However, I run into a crash when I try to run the pipeline itself, following along to the documentation.

 python src/run_quac.py       --project_name test_project       --projects_path ".test/ngs-data/"       --pedigree ".test/configs/${PRIOR_QC_STATUS}/${PROJECT_CONFIG}.ped"       --outdir "$USER_SCRATCH/tmp/quac/results/test_${PROJECT_CONFIG}_exome-${PRIOR_QC_STATUS}/analysis"       --quac_watch_config "configs/quac_watch/exome_quac_watch_config.yaml"       --exome       $USE_SLURM       -e="--conda-create-envs-only"
########################################
Command to run the pipeline:
snakemake \
    --snakefile '/home/username/devel/quac/workflow/Snakefile' \
    --config project_name='test_project' projects_path='/home/username/devel/quac/.test/ngs-data' ped='/home/username/devel/quac/.test/configs/no_priorQC/project_2samples.ped' quac_watch_config='/home/username/devel/quac/configs/quac_watch/exome_quac_watch_config.yaml' workflow_config='/home/username/devel/quac/configs/workflow.yaml' unique_id='2ad58efa-8557-4807-b85d-ed1d4752ff2f' out_dir='/home/username/devel/quac/scratch/tmp/quac/results/test_project_2samples_exome-no_priorQC/analysis' log_dir='/home/username/devel/quac/scratch/tmp/quac/logs' exome='True' include_prior_qc_data='False' allow_sample_renaming='False' \
    --restart-times 1 \
    --use-conda \
    --use-singularity \
    --singularity-args '--cleanenv --bind /home/username/devel/quac/scratch/tmp/quac/tmp:/tmp --bind /home/username/devel/quac/.test/ngs-data/test_project/analysis,/home/username/devel/quac/scratch/tmp/quac/logs,/home/username/devel/quac/scratch/tmp/quac/results/test_project_2samples_exome-no_priorQC/analysis,data/external/dependency_datasets/somalier,data/external/dependency_datasets/verifyBamID,data/external/dependency_datasets/reference_genome,/home/username/devel/quac/.test/configs/no_priorQC,/home/username/devel/quac/configs/quac_watch/exome_quac_watch_config.yaml,data/external/dependency_datasets/verifyBamID/exome' \
    --profile '/home/username/devel/quac/src/slurm/slurm_profile' \
    --conda-create-envs-only
########################################

Slurm job name   : "quac-2023-04-05T10.10.34.732874"
Slurm job script : "/home/username/devel/quac/scratch/tmp/quac/logs/quac-2023-04-05T10.10.34.732874.sh"
// Processing project: test_project
// Project path: "/home/username/devel/quac/.test/ngs-data/test_project/analysis"
// Exome mode: True
// Include prior QC data: False
// WARNING: '.test' present in the path supplied via --projects_path. So testing mode is used.
Building DAG of jobs...
Pulling singularity image docker://continuumio/miniconda3:4.7.12.
Pulling singularity image docker://brentp/somalier:v0.2.13.
Creating conda environment configs/env/quac_watch.yaml...
Downloading and installing remote packages.
CreateCondaEnvironmentException:
Could not create conda environment from /home/username/devel/quac/configs/env/quac_watch.yaml:
FATAL:   container creation failed: unable to add /home/username/devel/quac/data/external/dependency_datasets/somalier to mount list: destination must be an absolute path

  File "/home/username/miniconda3/envs/quac/lib/python3.6/site-packages/snakemake/deployment/conda.py", line 389, in create
Traceback (most recent call last):
  File "src/run_quac.py", line 372, in <module>
    main(ARGS)
  File "src/run_quac.py", line 202, in main
    submit_slurm_job(pipeline_cmd, job_dict)
  File "/home/username/devel/quac/src/slurm/submit_slurm_job.py", line 60, in submit_slurm_job
    job_id = slurm_job.run(**params_dict)
  File "/home/username/miniconda3/envs/quac/lib/python3.6/site-packages/slurmpy/slurmpy.py", line 171, in run
    res = subprocess.check_output(args).strip()
  File "/home/username/miniconda3/envs/quac/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/home/username/miniconda3/envs/quac/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bash', '/home/username/devel/quac/scratch/tmp/quac/logs/quac-2023-04-05T10.10.34.732874.sh']' returned non-zero exit status 1.

ManavalanG commented 1 year ago

@Redmar-van-den-Berg I created an issue (https://github.com/uab-cgds-worthey/quac/issues/68) to troubleshoot this error. I hope this works for you.

ManavalanG commented 1 year ago

@Redmar-van-den-Berg

The pipeline does not create the output and log folders automatically, so the user has to create the folders manually before the pipeline is started

We have now refactored QuaC to auto-create these folders (https://github.com/uab-cgds-worthey/quac/pull/67). Please see here for updated docs - https://quac.readthedocs.io/en/latest/

ManavalanG commented 1 year ago

@Redmar-van-den-Berg @brentp I would like to inform that we have refactored QuaC to move away from creation of conda environment within singularity containers. QuaC now simply depends on webserver-hosted container images, retrieves them automatically as part of the pipeline and uses them for snakemake-initiated jobs (https://github.com/uab-cgds-worthey/quac/pull/70).

lpantano commented 1 year ago

Hi @brentp, @Redmar-van-den-Berg, could you give an update on the current status of your review. I saw some work done but it is unclear how much work is left. Thanks

Redmar-van-den-Berg commented 1 year ago

@lpantano I'm still having issues running the pipeline, so I haven't been able to review the pipeline any further.

brentp commented 1 year ago

@ManavalanG would you show how to install the software and run on a fresh system?

If possible, I also suggest that you have someone else install and run the example without any guidance from you, perhaps while you watch. Please do this outside of your usual cluster. You have spent much time documenting and building this, but I also am not able to get it running. Currently, I see:

$ python src/run_quac.py       --project_name test_project       --projects_path ".test/ngs-data/"       --pedigree ".test/configs/${PRIOR_QC_STATUS}/${PROJECT_CONFIG}.ped"       --outdir "$USER_SCRATCH/tmp/quac/results/test_${PROJECT_CONFIG}_exome-${PRIOR_QC_STATUS}/analysis"       --quac_watch_config "configs/quac_watch/exome_quac_watch_config.yaml"       --exome       $USE_SLURM
Created directory: /scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/tmp/quac/results/test_project_2samples_exome-no_priorQC/analysis
ERROR: Following directories that are part of your input were not found: 
/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/external/dependency_datasets/somalier
/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/external/dependency_datasets/reference_genome
/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/external/dependency_datasets/verifyBamID
/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/external/dependency_datasets/verifyBamID/exome

It's not clear to me how to resolve this.

ManavalanG commented 1 year ago

@brentp It appears that datasets specified in the workflow config file configs/quac_watch/exome_quac_watch_config.yaml are not available in your system. Please see directions here on using the included script to set up all necessary dataset dependencies. Please let us know if this doesn't resolve the issue.

ManavalanG commented 1 year ago

@brentp

If possible, I also suggest that you have someone else install and run the example without any guidance from you, perhaps while you watch. Please do this outside of your usual cluster.

Thanks for the suggestion! We have had a user test the installation, but they used the same hardware resources as we use.

PS - I modified the error message now and then highlighted the [datasets requirement in the doc](https://quac.readthedocs.io/en/latest/reqts_configs/#set-up-workflow-config-file0. Hopefully it will better guide the user when they run into this issue :)

brentp commented 1 year ago

Hi, now I have the problem that there is, as far as I can tell, no way to set the slurm account. I have modified configs/cluster_config.json to include:

    "__default__": {
        "ntasks": 1,
        "partition": "quinlan-shared-rw",
        "cpus-per-task": "{threads}",
        "account": "quinlan-rw",
        "mem-per-cpu": "8G",
        "jobname": "QuaC.{rule}.{jobid}",
        "output": "{RULE_LOGS_PATH}/{rule}-%j.log"
    },

so a default parition and account is specified. However, the account is not propagated to the snakemake command that looks like this:

snakemake \
    --snakefile '/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/workflow/Snakefile' \
    --config project_name='test_project' projects_path='/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/.test/ngs-data' ped='/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/.test/configs/no_priorQC/project_2samples.ped' quac_watch_config='/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/configs/quac_watch/exome_quac_watch_config.yaml' workflow_config='/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/configs/workflow.yaml' unique_id='b27284c4-bf26-43df-9e5d-a05ac4001623' out_dir='/tmp/quac/results/test_project_2samples_exome-no_priorQC/analysis' log_dir='/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/quac/logs' exome='True' include_prior_qc_data='False' allow_sample_renaming='False' \
    --restart-times 1 \
    --use-singularity \
    --singularity-args '--cleanenv --bind /scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/quac/tmp:/tmp --bind /scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/.test/configs/no_priorQC,/tmp/quac/results/test_project_2samples_exome-no_priorQC/analysis,/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/configs/quac_watch/exome_quac_watch_config.yaml,/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/external/dependency_datasets/verifyBamID,/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/external/dependency_datasets/reference_genome,/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/quac/logs,/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/external/dependency_datasets/verifyBamID/exome,/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/.test/ngs-data/test_project/analysis,/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/external/dependency_datasets/somalier' \
    --profile '/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/src/slurm/slurm_profile' \
    --cluster-config '/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/configs/cluster_config.json' \
    --cluster 'sbatch --ntasks {cluster.ntasks} --partition {cluster.partition} --cpus-per-task {cluster.cpus-per-task} --mem-per-cpu {cluster.mem-per-cpu} --job-name {cluster.jobname} --output {cluster.output} --parsable'

ManavalanG commented 1 year ago

@brentp This issue was caused by hardcoded slurm's sbatch args in the wrapper script. We have now refactored it to construct the sbatch command based on args supplied in the cluster config file (uab-cgds-worthey/quac/pull/73). We expect this would resolve the issue. Sorry about the trouble!

We have also updated the docs to include cluster config step as part of the requirements.

brentp commented 1 year ago

I continue to try this and continue to have troubles. Now I get an error:

Traceback (most recent call last):
  File "/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/src/run_quac.py", line 437, in <module>
    main(ARGS)
  File "/scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/src/run_quac.py", line 239, in main
    "time": slurm_partition_times[args.slurm_partition],
KeyError: 'quinlan-shared-rw'

So I printed out slurm_partition_times before that error and see:

{'express': '02:00:00', 'short': '12:00:00', 'medium': '50:00:00', 'long': '150:00:00'}

then I try to find where to modify that:

$ git grep express
LICENSE:  You may not propagate or modify a covered work except as expressly
LICENSE:  In the following three paragraphs, a "patent license" is any express
LICENSE:(such as an express permission to practice a patent or covenant not to
configs/workflow.yaml:  express: "02:00:00"
docs/visualize_pipeline.md:srun --ntasks=1 --cpus-per-task=1 --mem-per-cpu=4096 --partition=express --pty /bin/bash

so I modify configs/workflow.yaml to include a time-limit for my partition. Then I see:

sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
Traceback (most recent call last):

So I see in the script that it's not specifying the account only the partition. Even though I have this in configs/cluster_config.json

    "__default__": {
        "ntasks": 1,
        "partition": "quinlan-shared-rw",
        "cpus-per-task": "{threads}",
        "account": "quinlan-rw",
        "mem-per-cpu": "8G",
        "jobname": "QuaC.{rule}.{jobid}",
        "output": "{RULE_LOGS_PATH}/{rule}-%j.log"
    },

The header in the .sh file contains no --account:


#SBATCH -e /scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/quac/logs/quac-2023-06-06T09.34.54.561465.%J.err
#SBATCH -o /scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/quac/logs/quac-2023-06-06T09.34.54.561465.%J.out
#SBATCH -J quac-2023-06-06T09.34.54.561465

#SBATCH --partition=quinlan-shared-rw
#SBATCH --ntasks=1
#SBATCH --time=120:00:00
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=8G

It seems it should be present from this PR: https://github.com/uab-cgds-worthey/quac/pull/73/files and I am on commit: 91894f5 in the joss_manuscript branch.

I tried adding:

--cluster_config configs/cluster_config.json

to the run_quac.py args but still account is not propagated. I also tried removing --slurm_partition from the run_quac.py args, but then it doesn't grab anything from cluster_config.json which results in this header for the sbatch script:

#SBATCH -e /scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/quac/logs/quac-2023-06-06T09.37.58.214455.%J.err
#SBATCH -o /scratch/ucgd/lustre-work/quinlan/u6000771/src/quac/data/quac/logs/quac-2023-06-06T09.37.58.214455.%J.out
#SBATCH -J quac-2023-06-06T09.37.58.214455

#SBATCH --partition=short
#SBATCH --ntasks=1
#SBATCH --time=12:00:00
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=8G

So it has ignored the cluster_config.json.

Again, I request that you have someone test this on a different cluster outside your lab. Note that they must do everything themselves, including finding the joss_manuscript branch, modifying the cluster_config, etc. without prompting.

Likely I am doing something very simple that I shouldn't, but I don't know what that is and have tried to find a solution without success.

lpantano commented 1 year ago

Hi @ManavalanG, any update on the last comment from @brentp?

ManavalanG commented 1 year ago

Hi @lpantano! We are working on resolving the issue that @brentp came across (https://github.com/uab-cgds-worthey/quac/pull/80). And we are also working on the suggestion to get folks try out the pipeline outside of our institution. We hope to have them completed by next week.

ManavalanG commented 1 year ago

@brentp @Redmar-van-den-Berg We have modified how slurm configs are provided to QuaC (https://github.com/uab-cgds-worthey/quac/pull/80), and we also made several other changes to improve user experience (checking dependencies exist, singularity works as expected, etc.). Also, as per @brentp's recommendation, we had three users install quac, configure it, run system testing on another Slurm HPC environment and provide feedback. The configuration updates as well as updates to the documentation are based on their feedback (https://github.com/uab-cgds-worthey/quac/pull/82). Please let me know if you have any questions.

lpantano commented 1 year ago

Hi @brentp and @Redmar-van-den-Berg, Does the last update fix your current issues so you can continue with the review? Thanks!

brentp commented 1 year ago

Hi, sorry for the delay. I have just moved continents and finally now set up to try this.

I was able to get this running. I had some trouble with setup dependencies as my gzip gave an error an the trailing "garbage" from the razip/bgzip'ed fasta. I changed it like this and it works

diff --git a/src/setup_dependency_datasets.sh b/src/setup_dependency_datasets.sh
index 0208e62..2f5f22f 100755
--- a/src/setup_dependency_datasets.sh
+++ b/src/setup_dependency_datasets.sh
@@ -16,7 +16,7 @@ mkdir -p $REF_GENOME_DIR && cd $REF_GENOME_DIR
 REF_GENOME_FNAME="GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz"
 REF_GENOME_URL=ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/${REF_GENOME_FNAME}
 curl -L $REF_GENOME_URL -o $REF_GENOME_FNAME
-gzip -d $REF_GENOME_FNAME
+gzip -c $REF_GENOME_FNAME > $(basename $REF_GENOME_FNAME .gz)

 ##### retrieve somalier tool dependencies #####
 echo "setting up somalier dependency datasets..."

I will update my review shortly.

brentp commented 1 year ago

I have completed the checkboxes in the review @lpantano please let me know if you need anything else.

lpantano commented 1 year ago

@Redmar-van-den-Berg, how is going your side of the review?

Redmar-van-den-Berg commented 1 year ago

Review checklist for @Redmar-van-den-Berg

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/uab-cgds-worthey/quac?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@ManavalanG) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

lpantano commented 1 year ago

Hi @Redmar-van-den-Berg, do you think you can finish this in the next week or so? Let me know, and thanks!

lpantano commented 1 year ago

Hi @Redmar-van-den-Berg, any updates on this? do you think you can work on this in the next week, it seems it is almost done.

Redmar-van-den-Berg commented 1 year ago

@lpantano Apologies for the delay, I've now completed the checklist.

There are still two open issues:

A request to improve the user documentation (https://github.com/uab-cgds-worthey/quac/issues/85)
A feature request to no longer rely on a fixed project structure to identify samples and files (https://github.com/uab-cgds-worthey/quac/issues/86)

ManavalanG commented 1 year ago

@Redmar-van-den-Berg Thanks for taking the time to review our submission and provide feedback. We have made the requested changes.

Users can now provide samples input info in a config file via --sample_config. Documentation is available here. (https://github.com/uab-cgds-worthey/quac/issues/86)
We have updated docs to include a section on editing quac-watch configs. (https://github.com/uab-cgds-worthey/quac/issues/85)

Note: Please be sure to use the latest version in readthedocs at the bottom right.

I hope this resolves the issues. Please let me know if you have any questions :)

Redmar-van-den-Berg commented 1 year ago

@ManavalanG Thanks for adding the requested features so quickly!

@lpantano I've completed the review checklist

lpantano commented 1 year ago

@editorialbot generate pdf

lpantano commented 1 year ago

Post-Review Checklist for Editor and Authors

Additional Author Tasks After Review is Complete

[x] Double check authors and affiliations (including ORCIDs)
[x] Make a release of the software with the latest changes from the review and post the version number here. This is the version that will be used in the JOSS paper.
[x] Archive the release on Zenodo/figshare/etc and post the DOI here.
[x] Make sure that the title and author list (including ORCIDs) in the archive match those in the JOSS paper.
[x] Make sure that the license listed for the archive is the same as the software license.

Editor Tasks Prior to Acceptance

[x] Read the text of the paper and offer comments/corrections (as either a list or a PR)
[x] Check the references in the paper for corrections (e.g. capitalization)
[x] Check that the archive title, author list, version tag, and the license are correct
[x] Set archive DOI with @editorialbot set <DOI here> as archive
[x] Set version with @editorialbot set <version here> as version
[x] Double check rendering of paper with @editorialbot generate pdf
[x] Specifically check the references with @editorialbot check references and ask author(s) to update as needed
[x] Recommend acceptance with @editorialbot recommend-accept

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

lpantano commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1093/bioinformatics/btw354 is OK
- 10.1038/s41436-020-01084-8 is OK
- 10.1038/s41525-020-00154-9 is OK
- 10.1093/bioinformatics/btv566 is OK
- 10.1093/bioinformatics/btx699 is OK
- 10.1093/gigascience/gix090 is OK
- 10.1093/gigascience/giab008 is OK
- 10.1101/gr.246934.118 is OK
- 10.1186/s13073-020-00761-2 is OK
- 10.12688/f1000research.15931.2 is OK

MISSING DOIs

- None

INVALID DOIs

- None

ManavalanG commented 1 year ago

Post-Review Checklist for Editor and Authors

Additional Author Tasks After Review is Complete

[x] Double check authors and affiliations (including ORCIDs)

[x] Make a release of the software with the latest changes from the review and post the version number here. This is the version that e used in the JOSS paper. - 1.6

[x] Archive the release on Zenodo/figshare/etc and post the DOI here. - https://doi.org/10.5281/zenodo.10002036

[x] Make sure that the title and author list (including ORCIDs) in the archive match those in the JOSS paper.

[x] Make sure that the license listed for the archive is the same as the software license.

ManavalanG commented 1 year ago

@lpantano Zenodo is going through a major upgrade, and it is not working fully as expected atm. QuaC's repo is already part of zenodo, but it doesn't let me update it to the recent QuaC version 1.6. As soon as zenodo is fully working again, I will update the version and will follow up here. Sorry about the delay!

ManavalanG commented 1 year ago

@lpantano I have created a new release 1.6 and archived it in Zenodo (DOI - https://doi.org/10.5281/zenodo.10002036). With this, I believe I have completed all the necessary tasks. Please let me know if I missed any :)

lpantano commented 1 year ago

@editorialbot set v1.6 as version

editorialbot commented 1 year ago

Done! version is now v1.6

lpantano commented 1 year ago

@editorialbot set 10.5281/zenodo.10002036 as archive

editorialbot commented 1 year ago

Done! archive is now 10.5281/zenodo.10002036

openjournals / joss-reviews

[REVIEW]: QuaC: A Pipeline Implementing Quality Control Best Practices for Genome Sequencing and Exome Sequencing Data #5313

Status

Reviewer instructions & questions

Checklists

Review checklist for @brentp

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @Redmar-van-den-Berg

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Post-Review Checklist for Editor and Authors

Additional Author Tasks After Review is Complete

Editor Tasks Prior to Acceptance

Post-Review Checklist for Editor and Authors

Additional Author Tasks After Review is Complete