matsengrp / phip-flow

A Nextflow pipeline to align, merge, and organize large PhIP-Seq datasets
MIT License
9 stars 6 forks source link

phip-flow command error when running merge-counts-stats.py #50

Closed dbayles closed 1 year ago

dbayles commented 1 year ago

I'm working on getting a test of your analysis suite running using nextflow and a Singularity container. I'm running the container in a virtual environment that also has phippery installed. When the container runs using the defaults (that I think should run the test data set for validation), the first several steps run fine; however, the nextflow analysis errors out when trying to run merge-counts-stats.py script. A phippery module is not being found. The error message from nextflow is:

Error executing process > 'ALIGN:collect_phip_data (1)'

Caused by: Process ALIGN:collect_phip_data (1) terminated with an error exit status (1)

Command executed:

merge-counts-stats.py -st validated_sample_table.csv -pt validated_peptide_table.csv -cfp ".counts" -sfp ".stats" -o data.phip

Command exit status: 1

Command output: (empty)

Command error: Traceback (most recent call last): File "/home/dob/.nextflow/assets/matsengrp/phip-flow/bin/merge-counts-stats.py", line 8, in import phippery.phipdata as phipdata ModuleNotFoundError: No module named 'phippery.phipdata'

I was unsuccessful in manually locating the phippery.phipdata module that the script it trying to import. Is the module currently included in the package? Has the module been renamed to something else? What would you suggest to either fix or troubleshoot the problem?

Thanks.

jgallowa07 commented 1 year ago

Hello @dbayles !

Yes, you are correct in the latest versions of phippery the phipdata module no longer exists.

I'm sorry the phippery package is in it's last iteration and some of the modules have been moved around. However, you'll notice the process containers are pinned to versions of phippery that align with the code on any one version of the pipeline.

So the problem is most likely that the virtual environment has a different (newer) version that is being used instead. We strongly suggest keeping pipeline processes contained in the container i.e. when I run

nextflow run matsengrp/phip-flow -r main -profile docker

everything runs fine for me.

Can you try running on the docker profile and let me know if that works for you? If this is still a problem the pipeline and phippery package are currently being finalized and this shouldn't be an issue any longer. Sorry for the inconvenience and thanks for the feedback!

dbayles commented 1 year ago

Jared,

Thanks for the information. I'm running on an HPC without the option or running using Docker; however, I used your Docker recipe as the basis for building my Singularity container. That said, I think the container is working as expected. One thing I noticed was I was using the "-r V1.0" option you currently have in the example walk through. Based on the information you provided in your response, I switched that to "-r main". I also ran the test constrained to the container and it ended with a different error this time, so that switch parameter alleviated the missing module error. The error I'm getting now is: Error executing process > 'ALIGN:collect_phip_data (1)'

Caused by: Process ALIGN:collect_phip_data (1) terminated with an error exit status (1)

Command executed:

merge-counts-stats.py -st validated_sample_table.csv -pt validated_peptide_table.csv -cfp ".counts" -sfp ".stats" -o data.phip

Command exit status: 1

Command output: (empty)

Command error: Traceback (most recent call last): File "/home/darrell.bayles/.nextflow/assets/matsengrp/phip-flow/bin/merge-counts-stats.py", line 110, in load_from_counts_tsv( File "/home/dob/.nextflow/assets/matsengrp/phip-flow/bin/merge-counts-stats.py", line 73, in load_from_counts_tsv peptide_table = collect_peptide_table(peptide_table) NameError: name 'collect_peptide_table' is not defined

I looked at "phippery/utils.py" inside the container, and think that the error is due to a naming difference in the definition for the collect_peptide_table function. In utils.py, inside the container, the function is named "_collect_peptidetable" (I.e. the function name is prefixed by the "" underscore.

jgallowa07 commented 1 year ago

Yes documentation is being updated with the rest of things, sorry again!

We can only guarantee that our images specified in the config, (for any given version of the pipeline), are pinned to specific version of phippery that should work with that pipeline code -- unfortunately we did not specify those versions in the Dockerfile, so it is most likely installing the latest phippery which has changes that the pipeline has not accounted for -- I'm updating things today and will be sure to update the Dockerfile accordingly. This helpful feedback!.

Currently, the working image for main branch is here. This image should be agnostic to Docker and Singularity.

We also provide a -profile cluster here, which uses Singularity and the same image for running the process containers. Have you tried using that?

dbayles commented 1 year ago

Jared,

Would you be willing to share your Docker file that your currently use for creating your container using the working image from the main branch you indicated in your previous message? I can try building my singularity container based off a definition file created from what you are using in your Docker file. (I did a quick test and it failed out at the ALIGN:short_read_alignment step when trying to run the short_read_alignment.sh script. I'd guess its something I don't have configured right when the container was built. It's giving me the bowtie options help text in the nextflow Command error output.

I did try the "-profile cluster" option, and I can see I'll need to rewrite some of the code so that the sbatch scripts will run on our cluster. A look at one of the submitted sbatch scripts told me there are user accounts and partition requests that will cause the scripts to be rejected on the HPC. If you can point me to the script where those parameters are pulled from, I might have a go at modifying the relevant parameters to get the software to run using the "-profile cluster" option. I guess before going down that road, I'd like to get the example data to run to completion using a simpler environment.

Thanks --Darrell

jgallowa07 commented 1 year ago

Hello!

So I just pushed branch V1.04 which has been updated to work with the latest version of phippery. The Dockerfile has been updated to pin the specific version of phippery necessary for running. Could you give this a try?

  1. Build the container image using the dockerfile, or pull from here
  2. Then, either clone the repo and switch to the V1.04 branch and run nextflow run main.nf. Or, if using the nextflow git aware, run nextflow run matsengrp/phip-flow -r V1.04 with the correct container.

Thank you for your patience!

dbayles commented 1 year ago

Jared.

It looks like I'm able to get the nextflow pipeline to run in the Singularity environment with some tweaks to your Dockerfile. (Those tweaks may be specific to Singularity, but I didn't do a lot of testing to pin it down unequivocally.) The output in the results folder appears correct. I'll email you some notes in case you have others interested in using Singularity. I'd like to replicate some of your R example tertiary analyses and then see if I can get your pipeline to run on some phipseq data we've generated. Thanks for the help in getting me going with your current working version.