nextstrain / cli

The Nextstrain command-line interface (CLI)—a program called nextstrain—which aims to provide a consistent way to run and visualize pathogen builds and access Nextstrain components like Augur and Auspice across computing environments such as Docker, Conda, and AWS Batch.
https://docs.nextstrain.org/projects/cli/
MIT License
28 stars 20 forks source link

`Your request was too big` error on stock RSV repo build via aws-batch #271

Open corneliusroemer opened 1 year ago

corneliusroemer commented 1 year ago

Current Behavior

When trying to run the nextstrain/rsv workflow using nextstrain build --aws-batch ... I get an error: botocore.exceptions.ClientError: An error occurred (MaxMessageLengthExceeded) when calling the PutObject operation: Your request was too big.

Expected behavior

It works, I don't get such an error, or at least cli can recover and advise me on how to workaround the issue.

How to reproduce

Steps to reproduce the current behavior:

  1. gh repo clone nextstrain/rsv
  2. Run:
    nextstrain build \         
    --aws-batch \
    --detach \
    --no-download \
    --cpus 16 \
    --memory 64gib \
    --exec env \
    . \
      snakemake \                  
        --configfiles config/configfile.yaml  \                                 
        --printshellcmds
  3. See error

Your environment: if running Nextstrain locally

Full logs

``` nextstrain build \ --aws-batch \ --detach \ --no-download \ --cpus 16 \ --memory 64gib \ --exec env \ . \ snakemake \ --configfiles config/configfile.yaml \ --printshellcmds Nextstrain Run ID: b1df3dba-e7b9-4336-8eff-754cd2727e17 Uploading /Users/corneliusromer/code/rsv to S3 zipping: /Users/corneliusromer/code/rsv zipping: /Users/corneliusromer/code/rsv/bin zipping: /Users/corneliusromer/code/rsv/bin/notify-on-deploy zipping: /Users/corneliusromer/code/rsv/bin/set-branch-ingest-config zipping: /Users/corneliusromer/code/rsv/bin/notify-on-error zipping: /Users/corneliusromer/code/rsv/bin/write-envdir zipping: /Users/corneliusromer/code/rsv/bin/notify-on-start zipping: /Users/corneliusromer/code/rsv/bin/notify-on-success zipping: /Users/corneliusromer/code/rsv/config zipping: /Users/corneliusromer/code/rsv/config/clades_genome_a.tsv zipping: /Users/corneliusromer/code/rsv/config/clades_genome_b.tsv zipping: /Users/corneliusromer/code/rsv/config/color_orderings.tsv zipping: /Users/corneliusromer/code/rsv/config/clades_G_b.tsv zipping: /Users/corneliusromer/code/rsv/config/areference.gbk zipping: /Users/corneliusromer/code/rsv/config/configfile.yaml zipping: /Users/corneliusromer/code/rsv/config/clades_G_a.tsv zipping: /Users/corneliusromer/code/rsv/config/description.md zipping: /Users/corneliusromer/code/rsv/config/outliers.txt zipping: /Users/corneliusromer/code/rsv/config/breference.gbk zipping: /Users/corneliusromer/code/rsv/config/areference.fasta zipping: /Users/corneliusromer/code/rsv/config/color_schemes.tsv zipping: /Users/corneliusromer/code/rsv/config/nextstrain_automation.yaml zipping: /Users/corneliusromer/code/rsv/config/breference.fasta zipping: /Users/corneliusromer/code/rsv/config/auspice_config.json zipping: /Users/corneliusromer/code/rsv/ingest zipping: /Users/corneliusromer/code/rsv/ingest/bin zipping: /Users/corneliusromer/code/rsv/ingest/bin/notify-on-job-start zipping: /Users/corneliusromer/code/rsv/ingest/bin/transform-authors zipping: /Users/corneliusromer/code/rsv/ingest/bin/transform-field-names zipping: /Users/corneliusromer/code/rsv/ingest/bin/upload-to-s3 zipping: /Users/corneliusromer/code/rsv/ingest/bin/gene-coverage.py zipping: /Users/corneliusromer/code/rsv/ingest/bin/transform-date-fields zipping: /Users/corneliusromer/code/rsv/ingest/bin/sort.py zipping: /Users/corneliusromer/code/rsv/ingest/bin/csv-to-ndjson.py zipping: /Users/corneliusromer/code/rsv/ingest/bin/transform-genbank-location zipping: /Users/corneliusromer/code/rsv/ingest/bin/ndjson-to-tsv-and-fasta zipping: /Users/corneliusromer/code/rsv/ingest/bin/notify-slack zipping: /Users/corneliusromer/code/rsv/ingest/bin/merge-user-metadata zipping: /Users/corneliusromer/code/rsv/ingest/bin/notify-on-job-fail zipping: /Users/corneliusromer/code/rsv/ingest/bin/notify-on-record-change zipping: /Users/corneliusromer/code/rsv/ingest/bin/apply-geolocation-rules zipping: /Users/corneliusromer/code/rsv/ingest/bin/join-metadata-and-clades.py zipping: /Users/corneliusromer/code/rsv/ingest/bin/metadata_dedup.py zipping: /Users/corneliusromer/code/rsv/ingest/bin/fasta-to-ndjson zipping: /Users/corneliusromer/code/rsv/ingest/bin/sha256sum zipping: /Users/corneliusromer/code/rsv/ingest/bin/s3-object-exists zipping: /Users/corneliusromer/code/rsv/ingest/bin/sequencesandmetadata.py zipping: /Users/corneliusromer/code/rsv/ingest/bin/transform-string-fields zipping: /Users/corneliusromer/code/rsv/ingest/bin/cloudfront-invalidate zipping: /Users/corneliusromer/code/rsv/ingest/bin/genbank-url zipping: /Users/corneliusromer/code/rsv/ingest/bin/transform-strain-names zipping: /Users/corneliusromer/code/rsv/ingest/config zipping: /Users/corneliusromer/code/rsv/ingest/config/b_1_reference.fasta zipping: /Users/corneliusromer/code/rsv/ingest/config/config.yaml zipping: /Users/corneliusromer/code/rsv/ingest/config/a_1_reference.fasta zipping: /Users/corneliusromer/code/rsv/ingest/config/a_3_reference.fasta zipping: /Users/corneliusromer/code/rsv/ingest/config/b_2_reference.fasta zipping: /Users/corneliusromer/code/rsv/ingest/config/b_3_reference.fasta zipping: /Users/corneliusromer/code/rsv/ingest/config/a_2_reference.fasta zipping: /Users/corneliusromer/code/rsv/ingest/config/optional.yaml zipping: /Users/corneliusromer/code/rsv/ingest/source-data zipping: /Users/corneliusromer/code/rsv/ingest/source-data/geolocation-rules.tsv zipping: /Users/corneliusromer/code/rsv/ingest/source-data/annotations.tsv zipping: /Users/corneliusromer/code/rsv/ingest/Snakefile zipping: /Users/corneliusromer/code/rsv/ingest/workflow zipping: /Users/corneliusromer/code/rsv/ingest/workflow/envs zipping: /Users/corneliusromer/code/rsv/ingest/workflow/envs/nextstrain.yaml zipping: /Users/corneliusromer/code/rsv/ingest/workflow/snakemake_rules zipping: /Users/corneliusromer/code/rsv/ingest/workflow/snakemake_rules/fetch_sequences.smk zipping: /Users/corneliusromer/code/rsv/ingest/workflow/snakemake_rules/upload.smk zipping: /Users/corneliusromer/code/rsv/ingest/workflow/snakemake_rules/sort.smk zipping: /Users/corneliusromer/code/rsv/ingest/workflow/snakemake_rules/transform.smk zipping: /Users/corneliusromer/code/rsv/README.md zipping: /Users/corneliusromer/code/rsv/env.d zipping: /Users/corneliusromer/code/rsv/Snakefile zipping: /Users/corneliusromer/code/rsv/logs zipping: /Users/corneliusromer/code/rsv/logs/traits_rsv_rsv.txt zipping: /Users/corneliusromer/code/rsv/workflow zipping: /Users/corneliusromer/code/rsv/workflow/envs zipping: /Users/corneliusromer/code/rsv/workflow/envs/nextstrain.yaml zipping: /Users/corneliusromer/code/rsv/workflow/snakemake_rules zipping: /Users/corneliusromer/code/rsv/workflow/snakemake_rules/glycosylation.smk zipping: /Users/corneliusromer/code/rsv/workflow/snakemake_rules/core.smk zipping: /Users/corneliusromer/code/rsv/workflow/snakemake_rules/clades.smk zipping: /Users/corneliusromer/code/rsv/workflow/snakemake_rules/export.smk zipping: /Users/corneliusromer/code/rsv/workflow/snakemake_rules/nextstrain_automation.smk zipping: /Users/corneliusromer/code/rsv/workflow/snakemake_rules/download.smk zipping: /Users/corneliusromer/code/rsv/scripts zipping: /Users/corneliusromer/code/rsv/scripts/clade_names.py zipping: /Users/corneliusromer/code/rsv/scripts/newreference.py zipping: /Users/corneliusromer/code/rsv/scripts/glycosylation.py zipping: /Users/corneliusromer/code/rsv/scripts/cut.py zipping: /Users/corneliusromer/code/rsv/scripts/align_for_tree.py zipping: /Users/corneliusromer/code/rsv/scripts/assign-colors.py zipping: /Users/corneliusromer/code/rsv/scripts/set_final_strain_name.py zipping: /Users/corneliusromer/code/rsv/scripts/wrangle_metadata.py zipping: /Users/corneliusromer/code/rsv/scripts/metadatadedup.py zipping: /Users/corneliusromer/code/rsv/.github zipping: /Users/corneliusromer/code/rsv/.github/workflows zipping: /Users/corneliusromer/code/rsv/.github/workflows/fetch-and-ingest.yaml zipping: /Users/corneliusromer/code/rsv/.github/workflows/rebuild.yaml /opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/botocore/utils.py:1720: FutureWarning: The S3RegionRedirector class has been deprecated for a new internal replacement. A future version of botocore may remove this class. warnings.warn( Traceback (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/s3fs/core.py", line 112, in _error_wrapper return await func(*args, **kwargs) File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/aiobotocore/client.py", line 358, in _make_api_call raise error_class(parsed_response, operation_name) botocore.exceptions.ClientError: An error occurred (MaxMessageLengthExceeded) when calling the PutObject operation: Your request was too big. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/bin/nextstrain", line 8, in sys.exit(main()) File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/nextstrain/cli/__main__.py", line 19, in main return cli.run( argv[1:] ) File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/nextstrain/cli/__init__.py", line 36, in run return opts.__command__.run(opts) File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/nextstrain/cli/command/build.py", line 195, in run return runner.run(opts, working_volume = opts.build, cpus = opts.cpus, memory = opts.memory) File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/nextstrain/cli/runner/__init__.py", line 232, in run return opts.__runner__.run(opts, argv, working_volume = working_volume, extra_env = extra_env, cpus = cpus, memory = memory) File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/nextstrain/cli/runner/aws_batch/__init__.py", line 129, in run remote_workdir = s3.upload_workdir(local_workdir, bucket, run_id) File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/nextstrain/cli/runner/aws_batch/s3.py", line 68, in upload_workdir with fsspec.open(object_url(remote_workdir), "wb", auto_mkdir = False) as remote_file: File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/fsspec/core.py", line 121, in __exit__ self.close() File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/fsspec/core.py", line 141, in close f.close() File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/fsspec/spec.py", line 1789, in close self.flush(force=True) File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/fsspec/spec.py", line 1660, in flush if self._upload_chunk(final=force) is not False: File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/s3fs/core.py", line 2215, in _upload_chunk self.commit() File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/s3fs/core.py", line 2230, in commit write_result = self._call_s3( File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/s3fs/core.py", line 2082, in _call_s3 return self.fs.call_s3(method, self.s3_additional_kwargs, *kwarglist, **kwargs) File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/fsspec/asyn.py", line 115, in wrapper return sync(self.loop, func, *args, **kwargs) File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/fsspec/asyn.py", line 100, in sync raise return_result File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/fsspec/asyn.py", line 55, in _runner result[0] = await coro File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/s3fs/core.py", line 347, in _call_s3 return await _error_wrapper( File "/opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/lib/python3.10/site-packages/s3fs/core.py", line 139, in _error_wrapper raise err OSError: [Errno 40] Your request was too big. ```
tsibley commented 1 year ago

@corneliusroemer What's the output of nextstrain version --verbose?

tsibley commented 1 year ago

Hm. This is coming from inside the aiobotocore client used by fsspec/s3fs. My first guess (probably wrong) is that you have an incompatible combination of s3fs and aiobotocore in whatever environment nextstrain itself is running.

corneliusroemer commented 1 year ago
$ nextstrain version --verbose
nextstrain.cli 6.2.1

Python
  /opt/homebrew/Caskroom/miniforge/base/envs/nextstrain/bin/python3.10
  3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:31:57) [Clang 14.0.6 ]

Runners
  docker 
    nextstrain/base:build-20230411T103027Z (bb05df2db0ce, 2023-04-11 13:38:55 +0200 CEST)
    augur 21.1.0
    auspice v2.45.2
    fauna e3ed8e1
    sacra not present

  conda 
    nextstrain-base 20230407T195218Z (h0dc7051_1_locked, nextstrain)
    augur 21.1.0
    auspice 2.45.1

  singularity 
    docker://nextstrain/base (not present)

  ambient (default)
    augur 21.1.0
    auspice 2.45.1

  aws-batch 
    unknown
tsibley commented 1 year ago

I ask because I wonder if https://github.com/bioconda/bioconda-recipes/pull/39711 is implicated. Though not certain if the expectation would be that you have a version of nextstrain-cli from Bioconda before or after that…

tsibley commented 1 year ago

Ah, the version is 6.2.1, which is after that Conda packaging change. Hmm.

corneliusroemer commented 1 year ago

This is my pip list:

``` Package Version ----------------------------- --------- aiobotocore 2.4.2 aioeasywebdav 2.4.0 aiohttp 3.8.4 aioitertools 0.11.0 aiosignal 1.3.1 amply 0.1.5 appdirs 1.4.4 appnope 0.1.3 asttokens 2.2.1 async-timeout 4.0.2 attmap 0.13.2 attrs 22.2.0 autopep8 2.0.2 backcall 0.2.0 backports.functools-lru-cache 1.6.4 backrefs 5.2 bcbio-gff 0.6.9 bcrypt 3.2.2 biopython 1.80 black 23.3.0 boto3 1.26.111 botocore 1.29.111 bracex 2.2.1 brotlipy 0.7.0 bx-python 0.9.0 cachetools 5.3.0 certifi 2022.12.7 cffi 1.15.1 charset-normalizer 2.1.1 click 8.1.3 colorama 0.4.6 comm 0.1.3 commonmark 0.9.1 ConfigArgParse 1.5.3 connection-pool 0.0.3 constellations 0.1.10 contourpy 1.0.7 crc32c 2.3.post0 cryptography 39.0.0 cvxopt 1.3.0 cycler 0.11.0 dataclasses 0.8 datrie 0.8.2 debugpy 1.6.7 decorator 5.1.1 deepdiff 6.3.0 defusedxml 0.7.1 distlib 0.3.6 docutils 0.19 dpath 2.1.5 dropbox 11.36.0 entrypoints 0.4 epiweeks 2.1.4 exceptiongroup 1.1.1 executing 1.2.0 fasteners 0.17.3 fastjsonschema 2.16.3 filechunkio 1.8 filelock 3.11.0 fonttools 4.39.3 frozenlist 1.3.3 fsspec 2023.4.0 ftputil 5.0.4 future 0.18.3 gitdb 4.0.10 GitPython 3.1.31 google-api-core 2.10.0 google-api-python-client 2.85.0 google-auth 2.17.2 google-auth-httplib2 0.1.0 google-cloud-core 2.3.2 google-cloud-storage 2.8.0 google-crc32c 1.1.2 google-resumable-media 2.4.1 googleapis-common-protos 1.57.0 grpcio 1.46.3 httplib2 0.22.0 humanfriendly 10.0 idna 3.4 importlib-metadata 6.3.0 importlib-resources 5.12.0 iniconfig 2.0.0 ipdb 0.13.13 ipykernel 6.22.0 ipython 8.7.0 isal 1.1.0 isodate 0.6.1 isort 5.12.0 jedi 0.18.2 Jinja2 3.1.2 jmespath 1.0.1 joblib 1.2.0 jsonschema 3.2.0 jupyter_client 8.1.0 jupyter_core 5.3.0 kiwisolver 1.4.4 libcst 0.4.9 llist 0.7.1 logmuse 0.2.6 MarkupSafe 2.1.2 matplotlib 3.7.1 matplotlib-inline 0.1.6 memory-profiler 0.61.0 MonkeyType 23.3.0 multidict 6.0.4 munkres 1.1.4 mypy 1.2.0 mypy-extensions 1.0.0 natsort 8.3.1 nbformat 5.8.0 nest-asyncio 1.5.6 networkx 2.8.8 nextstrain-augur 21.1.0 nextstrain-cli 6.2.1 nodeenv 1.7.0 numpy 1.24.2 nwkfmt 0.1.1 oauth2client 4.1.3 ordered-set 4.1.0 orjson 3.8.10 packaging 23.0 pandas 1.5.3 pango-aliasor 0.3.0 pango-designation 1.19 paramiko 3.1.0 parso 0.8.3 pathspec 0.11.1 peppy 0.35.5 pexpect 4.8.0 phylo-treetime 0.9.4 pickleshare 0.7.5 Pillow 9.2.0 pip 23.0.1 pipenv 2023.3.20 plac 1.3.5 platformdirs 3.2.0 pluggy 1.0.0 ply 3.11 polars 0.17.1 pooch 1.7.0 prettytable 3.7.0 prompt-toolkit 3.0.38 protobuf 3.18.3 psutil 5.9.4 ptpython 3.0.20 ptyprocess 0.7.0 PuLP 2.7.0 pure-eval 0.2.2 py 1.11.0 pyasn1 0.4.8 pyasn1-modules 0.2.7 pycodestyle 2.10.0 pycparser 2.21 pyfastx 0.8.4 Pygments 2.15.0 pygraphviz 1.10 PyJWT 2.6.0 pyllist 0.3 PyNaCl 1.5.0 pyOpenSSL 23.1.1 pyparsing 3.0.9 pyright 1.1.302 pyrsistent 0.19.3 pysam 0.20.0 pysftp 0.2.9 PySocks 1.7.1 pytest 7.3.0 python-dateutil 2.8.2 python-irodsclient 1.1.6 python-lzo 1.14 pytz 2023.3 pyu2f 0.1.5 PyYAML 6.0 pyzmq 25.0.2 ratelimiter 1.2.0 regex 2023.3.23 requests 2.28.2 reretry 0.11.8 retry 0.9.2 rich 12.6.0 rsa 4.9 s3fs 2023.3.0 s3transfer 0.6.0 scikit-learn 1.2.2 scipy 1.10.1 setuptools 67.6.1 setuptools-scm 7.1.0 shellingham 1.5.1 six 1.16.0 slacker 0.14.0 smart-open 6.3.0 smmap 3.0.5 snakefmt 0.8.4 snakemake 7.25.0 stack-data 0.6.2 stone 3.3.1 stopit 1.1.2 tabulate 0.9.0 threadpoolctl 3.1.0 throttler 1.2.1 toml 0.10.2 tomli 2.0.1 toposort 1.10 tornado 6.2 tqdm 4.65.0 traitlets 5.9.0 typer 0.7.0 typing_extensions 4.5.0 typing-inspect 0.8.0 ubiquerg 0.6.2 unicodedata2 15.0.0 uritemplate 4.1.1 urllib3 1.26.15 veracitools 0.1.3 virtualenv 20.21.0 virtualenv-clone 0.5.4 wcmatch 8.3 wcwidth 0.2.6 wheel 0.40.0 wrapt 1.15.0 xopen 1.7.0 xxhash 0.0.0 yarl 1.8.2 yte 1.5.1 zipp 3.15.0 zstandard 0.19.0 ```

Is it ok for boto3 and botocore to have different versions?

boto3                         1.26.111
botocore                      1.29.111
tsibley commented 1 year ago

This seems worth digging into, but I'd love if someone else in @nextstrain/core could pick it up.