phac-nml / snvphyl-galaxy-cli

A command line interface for the SNVPhyl Galaxy pipeline.
Apache License 2.0
2 stars 2 forks source link

Internal server error in galaxy using over 30 samples in SNVPhyl pipeline #5

Open rszymkiewicz opened 3 years ago

rszymkiewicz commented 3 years ago

Hello,

I am attempting to run a data set of over 100 paired-end WGS samples. I have successfully installed the DOCKER container and ran the example-data provided on Github. I have also been successful in running a smaller subset of my data (10 paired-end samples in total).

I have come across an internal server error when attempting to run my full data set. After 30 samples the connection fails during the fastq upload to galaxy. Below are the details of my failed run:

Command: python ./Software/snvphyl-galaxy-cli/bin/snvphyl.py --deploy-docker --copy-fastq-files-to-docker --fastq-dir ./Sequences/ONT_2019 --reference-file ./Sequences/Reference_Fasta/Ngo_FA1090.fasta --min-coverage 2 --output-dir rawReads_ONT2019

Standard Output with Error at bottom: Deploying Docker Container /========================== Running 'docker run --detach --publish 48888:80 phacnml/snvphyl-galaxy-1.0.1:1.0.1b' Docker id 01dc62e7bc8c69d8a3d14bdc47713a9c7802915d216ba8e3b83fb52ef7da2f66 Waiting for docker to complete launching Galaxy on port 48888 .....finished. Galaxy in Docker has (hopefully) started successfully. Took 0.26 minutes to deploy docker

Examining input fastq files /=========================== Structuring data in directory './Sequences/ONT_2019' like: 19-NG0027: paired {forward: ./Sequences/ONT_2019/19-NG0027_R1.fastq.gz, reverse: ./Sequences/ONT_2019/19-NG0027_R2.fastq.gz} ... 19-NG0160: paired {forward: /NetDrive/Users/rtullio/NIH_Grant/PilotA/Sequences/ONT_2019/19-NG0160_R1.fastq.gz, reverse: /NetDrive/Users/rtullio/NIH_Grant/PilotA/Sequences/ONT_2019/19-NG0160_R2.fastq.gz}

Set up workflow input /===================== setting parameter {irida.corefacility.ca/galaxy-shed/repos/nml/consolidate_vcfs/consolidate_vcfs/1.8.0, snv_abundance_ratio, 0.75} setting parameter {irida.corefacility.ca/galaxy-shed/repos/nml/consolidate_vcfs/consolidate_vcfs/1.8.0, coverage, 2} setting parameter {irida.corefacility.ca/galaxy-shed/repos/nml/verify_map/verify_map/1.8.0, mindepth, 2} setting parameter {irida.corefacility.ca/galaxy-shed/repos/nml/consolidate_vcfs/consolidate_vcfs/1.8.0, mean_mapping, 30} setting parameter {irida.corefacility.ca/galaxy-shed/repos/nml/find_repeats/findrepeat/1.8.0, length, 150} setting parameter {irida.corefacility.ca/galaxy-shed/repos/nml/find_repeats/findrepeat/1.8.0, pid, 90} setting parameter {irida.corefacility.ca/galaxy-shed/repos/nml/consolidate_vcfs/consolidate_vcfs/1.8.0, use_density_filter.threshold, 2} setting parameter {irida.corefacility.ca/galaxy-shed/repos/nml/consolidate_vcfs/consolidate_vcfs/1.8.0, use_density_filter.window_size, 500}

Upload files to Galaxy /====================== Creating history in Galaxy name 'snvphyl-Ngo_FA1090-2020-12-22-run' Uploading reference file /NetDrive/Users/rtullio/NIH_Grant/PilotA/Sequences/Reference_Fasta/Ngo_FA1090.fasta Uploading fastq files in history 'snvphyl-Ngo_FA1090-2020-12-22-run' Uploading as copy ./Sequences/ONT_2019/19-NG0027_R1.fastq.gz ... Uploading as copy ./Sequences/ONT_2019/19-NG0058_R2.fastq.gz

Undeploying and cleaning up Docker Container /============================================= Running 'docker rm -f -v 01dc62e7bc8c69d8a3d14bdc47713a9c7802915d216ba8e3b83fb52ef7da2f66' 01dc62e7bc8c69d8a3d14bdc47713a9c7802915d216ba8e3b83fb52ef7da2f66 Traceback (most recent call last): File "./Software/snvphyl-galaxy-cli/bin/snvphyl.py", line 1302, in main(snvphyl_version_settings, **dic) File "./Software/snvphyl-galaxy-cli/bin/snvphyl.py", line 974, in main repeat_minimum_length, repeat_minimum_pid, filter_density_window, filter_density_threshold, invalid_positions_file, output_dir) File "./Software/snvphyl-galaxy-cli/bin/snvphyl.py", line 1084, in main_galaxy fastq_id=upload_fastq_collection_paired(gi,history_id,fastq_paired) File "./Software/snvphyl-galaxy-cli/bin/snvphyl.py", line 576, in upload_fastq_collection_paired paired_elements=upload_fastq_history_paired(gi,history_id,fastq_paired) File "./Software/snvphyl-galaxy-cli/bin/snvphyl.py", line 545, in upload_fastq_history_paired reverse_galaxy=gi.tools.upload_file(reverse,history_id, file_type='fastqsanger') File "./Software/miniconda3/envs/PilotA/lib/python3.6/site-packages/bioblend/galaxy/tools/init.py", line 214, in upload_file return self._post(payload, files_attached=True) File "./Software/miniconda3/envs/PilotA/lib/python3.6/site-packages/bioblend/galaxy/client.py", line 173, in _post files_attached=files_attached) File "./Software/miniconda3/envs/PilotA/lib/python3.6/site-packages/bioblend/galaxyclient.py", line 132, in make_post_request body=r.text, status_code=r.status_code) bioblend.ConnectionError: Unexpected HTTP status code: 500:

As it fails at the point of uploading 30 paired-end samples to galaxy, I also attempted the same command above but using a limited number of my total data set (the 30 paired-end samples- the amount it originally failed at) it still fails however I receive a more detailed error (see below). Error: bioblend.ConnectionError: Unexpected HTTP status code: 500: Internal Server Error Internal Server Error Galaxy was unable to successfully complete your request An error occurred. This may be an intermittent problem due to load or other unpredictable factors, reloading the page may address the problem. The error has been logged to our team.

I am reaching out to see if there is a limitation to the total number of samples I can use in the SNVPhyl pipeline and if there are any recommendations to my problem.

Thank you in advance for your help. Have a great day!

apetkau commented 3 years ago

Thanks for the issue report.

I'm not exactly sure of the cause of the issue, but a few things I can think of:

  1. Since you are uploading copies of the fastq files into Docker, you will need a lot more space on your hard drive. Is it possible you are running out of space and this is causing the error? You could try getting rid of --copy-fastq-files-to-docker so that it doesn't create complete copies of the fastq files in Docker (but this requires the original files to have the correct permissions, the easiest being world-readable so they are visible within Docker).
  2. There could be a timeout issue in Galaxy when uploading so many files. But it's hard to debug such an issue without looking at the Galaxy log files. To monitor the Galaxy logs, when you see the message Docker id 01dc62e7bc8c69d8a3d14bdc47713a9c7802915d216ba8e3b83fb52ef7da2f66 you can run the command docker logs -f 01dc... in a separate terminal and this will print out all the information recorded by Galaxy, which can help out with further diagnosis.
rszymkiewicz commented 3 years ago

Thanks @apetkau for the quick response! I will look into your suggestions and reach out again with hopefully an update of the solution.