onnela-lab / beiwe-backend

Beiwe is a smartphone-based digital phenotyping research platform. This is the Beiwe backend code
https://www.beiwe.org/
BSD 3-Clause "New" or "Revised" License
64 stars 45 forks source link

Errors when running eb deploy #280

Closed eze1981 closed 1 year ago

eze1981 commented 2 years ago

Hi all,

I'm having some trouble with the cluster deployment instructions (https://github.com/onnela-lab/beiwe-backend/wiki/Deployment-Instructions---Scalable-Deployment). And I would like to share the issues faced during the deployment, both happened when running the eb deploy command:

  1. PyCrypto error when the .ebextension runs: python3 manage.py migrate I got an error when the .ebextension was trying to run the migrate script. I don't have the error message but I fixed it by patching PyCrypto as launch_script.py does.

  2. "SCRAM authentication requires libpq version 10 or above" After patching PyCrypto I got this new error message that I fixed installing a newer version of libpq.

These are the modifications that I implemented to 01.config file:

...
container_commands:
#  01_add_global_wsgi_application_group:
#    command: if ! grep -q 'WSGIApplicationGroup %{GLOBAL}' ../wsgi.conf ; then echo 'WSGIApplicationGroup %{GLOBAL}' >> ../wsgi.conf; fi;
  01_setup_profile:
    command: mv ./cluster_management/pushed_files/eb_profile.sh /home/ec2-user/.bashrc; chmod 644 /home/ec2-user/.bashrc; chown ec2-user /home/ec2-user/.bashrc; chgrp ec2-user /home/ec2-user/.bashrc
  02_setup_reasonable_inputrc:
    command: mv ./cluster_management/pushed_files/.inputrc /home/ec2-user/.inputrc; chmod 664 /home/ec2-user/.inputrc; chown ec2-user /home/ec2-user/.inputrc; chgrp ec2-user /home/ec2-user/.inputrc
  03_patch_cryptopy:
    command: tar -xf ./cluster_management/pushed_files/crypto.tar.gz -C /var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/Crypto
  04_update_pglib:
    command: sudo amazon-linux-extras install epel -y; sudo yum-config-manager --enable epel; sudo amazon-linux-extras install postgresql13 -y
  05_migrate:
    leader_only: true
    command: source /var/app/venv/*/bin/activate && python3 manage.py migrate
...

I don't like the hardcoded path in command 03_patch_pycrypto to /var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/Crypto I can create a PR once I have a better solution to that.

Now the eb deploy is running, but the application is still not working. I'm getting a Bad Request (400) coming from gunicorn. I'll continue the troubleshooting...

biblicabeebli commented 2 years ago

Thank you, I appreciate the thorough report! (Hoping to get pycrypto out of here eventually, it is sticking around purely for a legacy operation that I can't work out how to port to pycryptodome.)

The 400 error is probably due to a mismatch where your url does not match the DOMAIN_NAME parameter.

I don't know what goes wrong with the pycrypto package that causes the issue that my extremely gross patch fixes.

edit: oh derp your error message was "SCRAM authentication requires libpq version 10 or above"

biblicabeebli commented 2 years ago

oooooh the launch script deployed a the newest version of postgres, which is postgres13 and that has made a change.

eze1981 commented 2 years ago

Thank you, @biblicabeebli. Working on my domain name and SSL certificate configurations now.

I destroyed my environment to have a fresh start. I'm deploying from the main branch with default settings. This is a summary of the console log output when running eb deploy without any customizations in 01.config:

Command: eb deploy

[ssm-user@ip-172-xxx-yyy-zzz beiwe-backend]$ eb init
Do you wish to continue with CodeCommit? (Y/n): n
[ssm-user@ip-172-xxx-yyy-zzz beiwe-backend]$ eb deploy
Creating application version archive "app-e58a-220225_184721838168".
Uploading beiwe-application/app-e58a-220225_184721838168.zip to S3. This may take a while.
Upload Complete.
2022-02-25 18:47:22    INFO    Environment update is starting.
2022-02-25 18:48:04    INFO    Deploying new version to instance(s).
2022-02-25 18:49:25    INFO    Instance deployment successfully used commands in the 'Procfile' to start your application.
2022-02-25 18:49:27    ERROR   Instance deployment failed. For details, see 'eb-engine.log'.
2022-02-25 18:49:30    ERROR   [Instance: i-0fe746d898f0xxxxx] Command failed on instance. Return code: 1 Output: Engine execution has encountered an error..
2022-02-25 18:49:30    INFO    Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
2022-02-25 18:49:30    ERROR   Unsuccessful command execution on instance id(s) 'i-0fe746d898f0xxxxx'. Aborting the operation.
2022-02-25 18:49:31    ERROR   Failed to deploy application.

ERROR: ServiceError - Failed to deploy application.
[ssm-user@ip-172-xxx-yyy-zzz beiwe-backend]$

Command: cat /var/log/eb-engine.log

2022/02/25 18:49:26.326820 [INFO] Running command /bin/sh -c /opt/aws/bin/cfn-init -s arn:aws:cloudformation:us-east-1:252763820505:stack/awseb-e-hxpmzpaipu-stack/16992330-9669-11ec-8a73-127930ad5b41 -r AWSEBAutoScalingGroup --region us-east-1 --configsets Infra-EmbeddedPostBuild
2022/02/25 18:49:27.471541 [ERROR] An error occurred during execution of command [app-deploy] - [PostBuildEbExtension]. Stop running the command. Error: container commands build failed. Please refer to /var/log/cfn-init.log for more details.

2022/02/25 18:49:27.471560 [INFO] Executing cleanup logic
2022/02/25 18:49:27.471678 [INFO] CommandService Response: {"status":"FAILURE","api_version":"1.0","results":[{"status":"FAILURE","msg":"Engine execution has encountered an error.","returncode":1,"events":[{"msg":"Instance deployment successfully used commands in the 'Procfile' to start your application.","timestamp":1645814965,"severity":"INFO"},{"msg":"Instance deployment failed. For details, see 'eb-engine.log'.","timestamp":1645814967,"severity":"ERROR"}]}]}

2022/02/25 18:49:27.472313 [INFO] Platform Engine finished execution on command: app-deploy

Command: cat /var/log/cfn-init.log

2022-02-25 18:49:26,622 [INFO] Command 02_setup_reasonable_inputrc succeeded
2022-02-25 18:49:27,443 [ERROR] Command 03_migrate (source /var/app/venv/*/bin/activate && python3 manage.py migrate) failed
2022-02-25 18:49:27,443 [ERROR] Error encountered during build of postbuild_0_beiwe_application: Command 03_migrate failed
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/construction.py", line 573, in run_config
    CloudFormationCarpenter(config, self._auth_config).build(worklog)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/construction.py", line 273, in build
    self._config.commands)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/command_tool.py", line 127, in apply
    raise ToolError(u"Command %s failed" % name)
cfnbootstrap.construction_errors.ToolError: Command 03_migrate failed
2022-02-25 18:49:27,445 [ERROR] -----------------------BUILD FAILED!------------------------
2022-02-25 18:49:27,445 [ERROR] Unhandled exception during build: Command 03_migrate failed
Traceback (most recent call last):
  File "/opt/aws/bin/cfn-init", line 176, in <module>
    worklog.build(metadata, configSets)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/construction.py", line 135, in build
    Contractor(metadata).build(configSets, self)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/construction.py", line 561, in build
    self.run_config(config, worklog)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/construction.py", line 573, in run_config
    CloudFormationCarpenter(config, self._auth_config).build(worklog)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/construction.py", line 273, in build
    self._config.commands)
  File "/usr/lib/python3.7/site-packages/cfnbootstrap/command_tool.py", line 127, in apply
    raise ToolError(u"Command %s failed" % name)
cfnbootstrap.construction_errors.ToolError: Command 03_migrate failed
((staging) ) [ec2-user@ip-172-xxx-yyy-zzz current]$

Commands:

source /var/app/venv/*/bin/activate
cd ../staging
python3 manage.py migrate
((staging) ) [ec2-user@ip-172-xxx-yyy-zzz staging]$ python3 manage.py migrateTraceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/var/app/venv/staging-LQM1lest/lib/python3.8/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/var/app/venv/staging-LQM1lest/lib/python3.8/site-packages/django/core/management/__init__.py", line 357, in execute
    django.setup()
  File "/var/app/venv/staging-LQM1lest/lib/python3.8/site-packages/django/__init__.py", line 24, in setup
    apps.populate(settings.INSTALLED_APPS)
  File "/var/app/venv/staging-LQM1lest/lib/python3.8/site-packages/django/apps/registry.py", line 114, in populate
    app_config.import_models()
  File "/var/app/venv/staging-LQM1lest/lib/python3.8/site-packages/django/apps/config.py", line 211, in import_models
    self.models_module = import_module(models_module_name)
  File "/usr/lib64/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/var/app/staging/database/models.py", line 9, in <module>
    from .data_access_models import *
  File "/var/app/staging/database/data_access_models.py", line 15, in <module>
    from libs.s3 import s3_list_files, s3_retrieve
  File "/var/app/staging/libs/s3.py", line 6, in <module>
    from libs.encryption import (decrypt_server, encrypt_for_server, generate_key_pairing,
  File "/var/app/staging/libs/encryption.py", line 7, in <module>
    from Crypto.PublicKey import RSA as old_RSA
  File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/Crypto/PublicKey/RSA.py", line 585
    except ValueError, IndexError:
                     ^
SyntaxError: invalid syntax
biblicabeebli commented 2 years ago

Apologies for the slow response, I had a refrigerator die on my last week, reviewing this now...

eze1981 commented 2 years ago

Just in case, I just fixed the container_commands on my first message.

biblicabeebli commented 2 years ago

So, the insanity at hand is that PyCrypto runs python 2to3 on itself at installation time to fix python 3 syntax compatibility issues. This is... insane. For some reason the 2to3 operation fails. This happens for you on your elastic beanstalk server, and for others (me, 4 separate instances) on the Ubuntu data processing servers.

The fixes I can think of are to do the same hack of forcibly overwriting the site-packages file, or for me to create a copy of the pycrypto repo, host it on github, and install it as a git target dependency. The git repo would be cleaner, but probably requires editing that setup.py file to ensure installation succeeds....

--

Okay I looked through some forks and issues on the pycrypto github repo. The current master is an unreleased (but functional for our minimal use-case purposes) version 2.7a1, I fiddled with it as an option during debugging this issue.

This pull request https://github.com/pycrypto/pycrypto/pull/296 indicates a possibly 3.8 compatibiliy'd variant, so that would be a good place to start

Try replacing pycrypto==2.6.1 with git+https://github.com/fabiant7t/pycrypto in requirements.txt, should be all that is needed to test, would you mind trying that? (no, I don't know that this will fix the 2to3 insanity, but it's a reasonable thing to try.)

(If we find a repo that works I will probably clone that repo directly for stability.)

(I will definitely be taking the additions you've worked on in this thread, thank you, sorry for the insanity here, it worked for me repeatedly....)

eze1981 commented 2 years ago

Thank you, for sure, I'll try the dependency edit on the requirements.txt file.

Command _04_updatepglib is based on this article: https://aws.amazon.com/premiumsupport/knowledge-center/ec2-enable-epel/. I'm not sure if those three sub commands can be moved to the packages section, I'll try to explore that option.

biblicabeebli commented 2 years ago

any luck with the dependency?

(I'm watching this issue, trying to get a bunch of work handled on my own, will incorporate this work onto staged-updates once we have a working solution. If there are any other watchers waiting on that work please comment!)

eze1981 commented 2 years ago

I'm using the workaround from my first comment, and I got it up and running. I'll try to test the new dependency tomorrow or earlier next week.

Also, I have forked the repo and added support for deploying in a non-default VPC. The fix needs some improvements; I'll create a PR once I have a more polished version of it: https://github.com/ORC-RIS/beiwe-backend.

biblicabeebli commented 2 years ago

Oh wow!

Alright, I'll try and get these items into a branch today.

eze1981 commented 2 years ago

A new branch will be great. I can work on my fork on the same branch and create a pull request once is ready. Works that for you? Also, I'm creating a new issue to discuss the details about deploying in a non-default VPC.

biblicabeebli commented 2 years ago

Been fighting an unrelated issue on my production server, I'm behind on everything, sorry for the snail-like pace.

biblicabeebli commented 2 years ago

I just pushed some changes that affect the .elasticbeanstal/01.config file, so please review your changes for conflicts. I should be able to get back to this issue... soon.

biblicabeebli commented 2 years ago

okay, I'm using the deployment-tweaks branch to centralize these updates. I went and fully documented the 01.config file, apologies for inevitable merge conflicts. I haven't deployed to test yet, still getting ready and the, frankly, its a friday night and I'm more bored than I am feeling productive. :b

biblicabeebli commented 2 years ago

@eze1981 I will want to review the non-default VPC details, and will have to update the official documentation stating that we don't support that.

biblicabeebli commented 2 years ago

aha!

The reason this issue was not occurring for our deployments is because we were behind on a minor platform version update, which was blocked by the other stability issues I mentioned I was dealing with. These issues only became apparent under high load, so only some existing deployments encountered them. Once we updated a deployment to the Python 3.8, version 3.3.11, we hit the same errors. Why? Noooo cluuuue.

The updates from the deployment-tweaks branch have been merged into main. Further changes or improvements (fixing pagespeed?) will remain on deployment-tweaks.

zagorsky commented 2 years ago

@eze1981, PyCrypto has been fully removed from the main branch and replaced with PyCryptodomex. It should fix this error you were getting:

File "/var/app/staging/libs/encryption.py", line 7, in <module>
    from Crypto.PublicKey import RSA as old_RSA
  File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/Crypto/PublicKey/RSA.py", line 585
    except ValueError, IndexError:

Because I saw the same error while running locally.

biblicabeebli commented 1 year ago

Josh fixed this several months ago, thanks Josh!