griffithlab / pVACtools

http://www.pvactools.org
BSD 3-Clause Clear License
137 stars 59 forks source link

pVACseq submits too big of a file to NetMHCstab server? #1035

Closed mantczakaus closed 10 months ago

mantczakaus commented 10 months ago

Installation Type

Standalone

pVACtools Version / Docker Image

3.0.2

Python Version

No response

Operating System

No response

Describe the bug

I'm executing pvac for a patient data separately for each of the MHC alleles. It fails on one particular allele HLA-C03:03 but it passes on many others, e.g. DRA01:01 or HLA-A*68:01. The error message talks about a file that is too big (see the output). I have run this previously (same patient, same allele, maybe slightly different mutations set on 14/10/2023 and it worked). All of this makes me think that there is no problem with the server but maybe its behaviour changed?

How to reproduce this bug

#!/bin/bash -ue
mkdir -p /scratch/project_mnt/S0091/mantczak/.tmp
export TMPDIR=/scratch/project_mnt/S0091/mantczak/.tmp
   pvacseq run \
       --iedb-install-directory /opt/iedb \
       -t 10 \
       -p patient1_vep_phased.vcf.gz \
       -e1 8,9,10,11 \
       -e2 15,16,17,18,19,20,21,22,23,24,25 \
       --normal-sample-name patient1_normal \
        \
       --netmhc-stab \
       --binding-threshold 500 --top-score-metric lowest --minimum-fold-change 0.0 --normal-cov 0 --tdna-cov 0 --trna-cov 0 --normal-vaf 1 --trna-vaf 0.0 --tdna-vaf 0.0 --expn-val 0 --maximum-transcript-support-level 5 \
       patient1_vep_somatic_gx.vcf.gz patient1_tumor HLA-C*03:03 NetMHCpan MHCflurry NetMHCIIpan ./

   if [ -e ./MHC_Class_I/patient1_tumor.filtered.tsv ]; then
       mv ./MHC_Class_I/patient1_tumor.filtered.tsv ./MHC_Class_I/patient1_tumor_HLA-C*03:03.filtered.tsv
   fi
   if [ -e ./MHC_Class_I/patient1_tumor.all_epitopes.tsv ]; then
       mv ./MHC_Class_I/patient1_tumor.all_epitopes.tsv ./MHC_Class_I/patient1_tumor_HLA-C*03:03.all_epitopes.tsv
   fi
   if [ -e ./MHC_Class_II/patient1_tumor.filtered.tsv ]; then
       mv ./MHC_Class_II/patient1_tumor.filtered.tsv ./MHC_Class_II/patient1_tumor_HLA-C*03:03.filtered.tsv
   fi
   if [ -e ./MHC_Class_II/patient1_tumor.all_epitopes.tsv ]; then
       mv ./MHC_Class_II/patient1_tumor.all_epitopes.tsv ./MHC_Class_II/patient1_tumor_HLA-C*03:03.all_epitopes.tsv
   fi

Input files

patient1_vep_phased.vcf.gz patient1_vep_somatic_gx.vcf.gz indices.zip

Log output

... Running NetMHCStabPan Traceback (most recent call last): File "/opt/conda/bin/pvacseq", line 8, in sys.exit(main()) File "/opt/conda/lib/python3.8/site-packages/pvactools/tools/pvacseq/main.py", line 116, in main args[0].func.main(args[1]) File "/opt/conda/lib/python3.8/site-packages/pvactools/tools/pvacseq/run.py", line 131, in main pipeline.execute() File "/opt/conda/lib/python3.8/site-packages/pvactools/lib/pipeline.py", line 506, in execute PostProcessor(**post_processing_params).execute() File "/opt/conda/lib/python3.8/site-packages/pvactools/lib/post_processor.py", line 38, in execute self.call_netmhc_stab() File "/opt/conda/lib/python3.8/site-packages/pvactools/lib/post_processor.py", line 121, in call_netmhc_stab NetMHCStab(self.net_chop_fh.name, self.netmhc_stab_fh.name, self.file_type, self.top_score_metric).execute() File "/opt/conda/lib/python3.8/site-packages/pvactools/lib/netmhc_stab.py", line 83, in execute response = self.query_netmhcstabpan_server(staging_file, length, netmhcstabpan_allele, jobid_searcher) File "/opt/conda/lib/python3.8/site-packages/pvactools/lib/netmhc_stab.py", line 165, in query_netmhcstabpan_server raise Exception("Error posting request to NetMHCstabpan server.\n{}".format(response.content.decode())) Exception: Error posting request to NetMHCstabpan server.

Internal Server Error

Internal Server Error

Regrettably, the server encountered an error while processing your request.

If this page was shown after submitting your data to one of our services, then the cause of the error is very likely that the data/file submitted is too large. You simply need to submit a smaller file.
Depending on the service in question. the limit is either 2 MB or 10 GB.
You can contact the service support regarding this, but maybe it would be more productive to download the software (if available) and run it on your own equipment.

Output files

No response

susannasiebert commented 10 months ago

Our automated tests have also been failing and those run on small samples so I don't think your particular samples are the cause. There might have been a reconfiguration on their end. I will need to investigate this issue further. For now I suggest turning off this feature so that you can still get a completed run. Once the issue is fixed on our end you can add NetMHCstabpan data to your outputs by running the standalone pvacseq netmhc_stab command.

susannasiebert commented 10 months ago

After looking into this some more I don't see anything that might've changed about the NetMHCstabpan server. I reran our automated testes and they seem to be passing now. By my best guess, there might've been a temporary issue with their server over the weekend that is now fixed. Maybe give it another try and see if it works for you now.

mantczakaus commented 10 months ago

Thank you for your swift feedback! I rerun yesterday with another error (see below) but today it seems to be working. Best wishes, Magda

Command error:
      self._validate_conn(conn)
    File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn
      conn.connect()
    File "/opt/conda/lib/python3.8/site-packages/urllib3/connection.py", line 309, in connect
      conn = self._new_conn()
    File "/opt/conda/lib/python3.8/site-packages/urllib3/connection.py", line 164, in _new_conn
      raise ConnectTimeoutError(
  urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7fdde05cad00>, 'Connection to services.healthtech.dtu.dk timed out. (connect timeout=10)')

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
      resp = conn.urlopen(
    File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 726, in urlopen
      retries = retries.increment(
    File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 446, in increment
      raise MaxRetryError(_pool, url, error or ResponseError(cause))
  urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='services.healthtech.dtu.dk', port=443): Max retries exceeded with url: /cgi-bin/webface2.cgi (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fdde05cad00>, 'Connection to services.healthtech.dtu.dk timed out. (connect timeout=10)'))

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/opt/conda/bin/pvacseq", line 8, in <module>
      sys.exit(main())
    File "/opt/conda/lib/python3.8/site-packages/pvactools/tools/pvacseq/main.py", line 116, in main
      args[0].func.main(args[1])
    File "/opt/conda/lib/python3.8/site-packages/pvactools/tools/pvacseq/run.py", line 131, in main
      pipeline.execute()
    File "/opt/conda/lib/python3.8/site-packages/pvactools/lib/pipeline.py", line 506, in execute
      PostProcessor(**post_processing_params).execute()
    File "/opt/conda/lib/python3.8/site-packages/pvactools/lib/post_processor.py", line 38, in execute
      self.call_netmhc_stab()
    File "/opt/conda/lib/python3.8/site-packages/pvactools/lib/post_processor.py", line 121, in call_netmhc_stab
      NetMHCStab(self.net_chop_fh.name, self.netmhc_stab_fh.name, self.file_type, self.top_score_metric).execute()
    File "/opt/conda/lib/python3.8/site-packages/pvactools/lib/netmhc_stab.py", line 83, in execute
      response = self.query_netmhcstabpan_server(staging_file, length, netmhcstabpan_allele, jobid_searcher)
    File "/opt/conda/lib/python3.8/site-packages/pvactools/lib/netmhc_stab.py", line 145, in query_netmhcstabpan_server
      response = requests.post(
    File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 119, in post
      return request('post', url, data=data, json=json, **kwargs)
    File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
      return session.request(method=method, url=url, **kwargs)
    File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
      resp = self.send(prep, **send_kwargs)
    File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
      r = adapter.send(request, **kwargs)
    File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 504, in send
      raise ConnectTimeout(e, request=request)
  requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='services.healthtech.dtu.dk', port=443): Max retries exceeded with url: /cgi-bin/webface2.cgi (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fdde05cad00>, 'Connection to services.healthtech.dtu.dk timed out. (connect timeout=10)'))
susannasiebert commented 10 months ago

That error seems to be related to whatever outage might've happened with their server. Glad it's working again.