harrystech / arthur-redshift-etl

ELT Code for your Data Warehouse
MIT License
25 stars 11 forks source link

DENG-2548: Bump PyYAML version to hotfix #853

Closed jwisdom-harrys closed 11 months ago

jwisdom-harrys commented 11 months ago

Bump PyYAML version to 6.0.1 hotfix that fixes pip installs. See https://github.com/yaml/pyyaml/issues/601.

Testing: Got the test from the hdw-validate job in the harrys repo. Before patch:

-->docker run -v $(pwd):/arthur-redshift-etl harrystech/pg-python3:latest  bash -c "apk add --no-cache python3-dev; apk add --no-cache python3 postgresql-libs; apk add --no-cache --virtual .build-deps gcc musl-dev postgresql-dev; cd arthur-redshift-etl; pip3 install -r requirements.txt;"
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
...
https://files.pythonhosted.org/packages/8c/45/77147700f5088efaf9235a3a62b611b594d477a5c5613b5316d0ebd18be0/psycopg2-binary-2.9.5.tar.gz (384kB)
Collecting PyYAML==6.0 (from -r requirements.txt (line 11))
  Downloading https://files.pythonhosted.org/packages/36/2b/61d51a2c4f25ef062ae3f74576b01638bebad5e045f747ff12643df63844/PyYAML-6.0.tar.gz (124kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
    Complete output from command python setup.py egg_info:
    running egg_info
    creating pip-egg-info/PyYAML.egg-info
    writing pip-egg-info/PyYAML.egg-info/PKG-INFO
    writing dependency_links to pip-egg-info/PyYAML.egg-info/dependency_links.txt
    writing top-level names to pip-egg-info/PyYAML.egg-info/top_level.txt
    writing manifest file 'pip-egg-info/PyYAML.egg-info/SOURCES.txt'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-wvykadqb/PyYAML/setup.py", line 312, in <module>
        python_requires='>=3.6',
      File "/tmp/pip-build-env-0_kq_htp/lib/python3.6/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/usr/lib/python3.6/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/tmp/pip-build-env-0_kq_htp/lib/python3.6/site-packages/setuptools/command/egg_info.py", line 299, in run
        self.find_sources()
      File "/tmp/pip-build-env-0_kq_htp/lib/python3.6/site-packages/setuptools/command/egg_info.py", line 306, in find_sources
        mm.run()
      File "/tmp/pip-build-env-0_kq_htp/lib/python3.6/site-packages/setuptools/command/egg_info.py", line 541, in run
        self.add_defaults()
      File "/tmp/pip-build-env-0_kq_htp/lib/python3.6/site-packages/setuptools/command/egg_info.py", line 578, in add_defaults
        sdist.add_defaults(self)
      File "/tmp/pip-build-env-0_kq_htp/lib/python3.6/site-packages/setuptools/command/py36compat.py", line 34, in add_defaults
        self._add_defaults_ext()
      File "/tmp/pip-build-env-0_kq_htp/lib/python3.6/site-packages/setuptools/command/py36compat.py", line 118, in _add_defaults_ext
        self.filelist.extend(build_ext.get_source_files())
      File "/tmp/pip-install-wvykadqb/PyYAML/setup.py", line 204, in get_source_files
        self.cython_sources(ext.sources, ext)
      File "/usr/lib/python3.6/distutils/cmd.py", line 103, in __getattr__
        raise AttributeError(attr)
    AttributeError: cython_sources

After patch:

-->docker run -v $(pwd):/arthur-redshift-etl harrystech/pg-python3:latest  bash -c "apk add --no-cache python3-dev; apk add --no-cache python3 postgresql-libs; apk add --no-cache --virtual .build-deps gcc musl-dev postgresql-dev; cd arthur-redshift-etl; pip3 install -r requirements.txt;"
fetch http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
...
https://files.pythonhosted.org/packages/06/b3/24afc8868eba069a7f03650ac750a778862dc34941a4bebeb58706715726/charset_normalizer-2.0.12-py3-none-any.whl
Collecting idna<4,>=2.5; python_version >= "3" (from requests!=2.18.0,>=2.14.2->docker==5.0.3->-r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/fc/34/3030de6f1370931b9dbb4dad48f6ab1015ab1d32447850b9fc94e60097be/idna-3.4-py3-none-any.whl (61kB)
Collecting zipp>=0.5 (from importlib-metadata; python_version < "3.8"->jsonschema==3.2.0->-r requirements.txt (line 8))
  Downloading https://files.pythonhosted.org/packages/bd/df/d4a4974a3e3957fd1c1fa3082366d7fff6e428ddb55f074bf64876f8e8ad/zipp-3.6.0-py3-none-any.whl
Installing collected packages: typing-extensions, six, python-dateutil, arrow, jmespath, urllib3, botocore, s3transfer, boto3, certifi, charset-normalizer, idna, requests, websocket-client, docker, funcy, zipp, importlib-metadata, attrs, pyrsistent, jsonschema, pgpasslib, psycopg2-binary, PyYAML, simplejson, tabulate, termcolor, importlib-resources, tqdm, watchtower
  Running setup.py install for pyrsistent: started
    Running setup.py install for pyrsistent: finished with status 'done'
  Running setup.py install for psycopg2-binary: started
    Running setup.py install for psycopg2-binary: finished with status 'done'
  Running setup.py install for PyYAML: started
    Running setup.py install for PyYAML: finished with status 'done'
  Running setup.py install for termcolor: started
    Running setup.py install for termcolor: finished with status 'done'
Successfully installed PyYAML-6.0.1 arrow-1.2.3 attrs-22.2.0 boto3-1.23.10 botocore-1.26.10 certifi-2023.5.7 charset-normalizer-2.0.12 docker-5.0.3 funcy-1.18 idna-3.4 importlib-metadata-4.8.3 importlib-resources-5.4.0 jmespath-0.10.0 jsonschema-3.2.0 pgpasslib-1.1.0 psycopg2-binary-2.9.5 pyrsistent-0.18.0 python-dateutil-2.8.2 requests-2.27.1 s3transfer-0.5.2 simplejson-3.18.3 six-1.16.0 tabulate-0.8.10 termcolor-1.1.0 tqdm-4.64.1 typing-extensions-4.1.1 urllib3-1.26.16 watchtower-3.0.0 websocket-client-1.3.1 zipp-3.6.0
You are using pip version 18.1, however version 21.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
ynaim94-harrys commented 11 months ago

@jwisdom-harrys could we also try running some commands? That would be safe to do as there is low test coverage. We could try something like Arthur bootstrap (which reads and write yaml) and an Arthur extract/load on an S3 source since those don't need sqoop so you can run them locally from the container.

jwisdom-harrys commented 11 months ago

could we also try running some commands?

Sure thing. It looks like arthur isn't having trouble loading the yaml appropriately or writing it

(aws:de-dev, prefix:jwisdom) $ arthur.py ping
2023-07-18 15:09:27 - INFO - Starting log for redshift_etl v1.65.0 with ETL ID 73617C6251A24DB0
2023-07-18 15:09:27 - INFO - Command line: "/opt/local/redshift_etl/venv/bin/arthur.py ping"
2023-07-18 15:09:27 - INFO - Release information: toplevel=/opt/src/arthur-redshift-etl, commit=3fa60fd065852463e109018449ddffe5e453169b (v1.64.0), date=2023-02-13 13:39:13 -0500
2023-07-18 15:09:27 - INFO - Loading settings from '/opt/src/arthur-redshift-etl/python/etl/config/default_settings.yaml'
2023-07-18 15:09:27 - INFO - Loading settings from '/opt/data-warehouse/config_data_development/aws.yaml'
2023-07-18 15:09:27 - INFO - Loading environment variables from '/opt/data-warehouse/config_data_development/credentials.sh'
2023-07-18 15:09:27 - INFO - Loading settings from '/opt/data-warehouse/config_data_development/harrys.yaml'
2023-07-18 15:09:27 - INFO - Loading settings from '/opt/data-warehouse/config_data_development/harrys_dev.yaml'
2023-07-18 15:09:28 - INFO - Connecting to: host=polaris.dev.harrys.systems port=5439 dbname=development user=etl password=***

and

(aws:de-dev, prefix:jwisdom) $ rm schemas/harryswww/public-wholesalers.yaml
(aws:de-dev, prefix:jwisdom) $ arthur.py bootstrap_sources harryswww
2023-07-18 15:21:34 - INFO - Starting log for redshift_etl v1.65.0 with ETL ID 66493C612CDD4DEE
2023-07-18 15:21:34 - INFO - Command line: "/opt/local/redshift_etl/venv/bin/arthur.py bootstrap_sources harryswww"
2023-07-18 15:21:34 - INFO - Release information: toplevel=/opt/src/arthur-redshift-etl, commit=3fa60fd065852463e109018449ddffe5e453169b (v1.64.0), date=2023-02-13 13:39:13 -0500
2023-07-18 15:21:34 - INFO - Loading settings from '/opt/src/arthur-redshift-etl/python/etl/config/default_settings.yaml'
2023-07-18 15:21:34 - INFO - Loading settings from '/opt/data-warehouse/config_data_development/aws.yaml'
2023-07-18 15:21:34 - INFO - Loading environment variables from '/opt/data-warehouse/config_data_development/credentials.sh'
2023-07-18 15:21:34 - INFO - Loading settings from '/opt/data-warehouse/config_data_development/harrys.yaml'
2023-07-18 15:21:34 - INFO - Loading settings from '/opt/data-warehouse/config_data_development/harrys_dev.yaml'
2023-07-18 15:21:34 - INFO - Looking for files locally in 'schemas'
2023-07-18 15:21:34 - INFO - Found 44 matching file(s) for 44 table(s)
2023-07-18 15:21:35 - INFO - Finished loading 44 table design file(s) using 8 threads (0.26s)
2023-07-18 15:21:35 - INFO - Connecting to database source 'harryswww' to look for tables
2023-07-18 15:21:35 - INFO - Connecting to: host=ec2-54-146-214-46.compute-1.amazonaws.com port=5432 dbname=d6cfrafoorjgg5 user=etl_ro password=***
2023-07-18 15:21:35 - INFO - Found 45 table(s) matching patterns; allowlist=['public.authentications', 'public.billing_profiles', 'public.cancellation_survey_options', 'public.cancellation_survey_responses', 'public.checkout_invoices', 'public.checkout_invoices_shave_plans', 'public.credits', 'public.discount_code_batches', 'public.discount_codes', 'public.discount_group_items', 'public.discount_groups', 'public.discount_orders', 'public.discount_product_entitlements', 'public.discounts', 'public.gift_notecards', 'public.incentives', 'public.membership_cancellation_reasons', 'public.membership_cancellations', 'public.membership_events', 'public.membership_programs', 'public.membership_retry_lifecycle_enrollments', 'public.membership_retry_lifecycle_events', 'public.membership_tax_addresses', 'public.memberships', 'public.one_time_shave_plan_additions', 'public.payment_provider_profiles', 'public.redeemed_credits', 'public.shave_plan_events', 'public.shave_plan_retry_lifecycle_enrollments', 'public.shave_plan_retry_lifecycle_events', 'public.shave_plans', 'public.shipping_addresses', 'public.shipping_tiers', 'public.subscriptions', 'public.survey_question_choices', 'public.survey_question_responses', 'public.survey_questions', 'public.surveys', 'public.tos_opt_outs', 'public.user_experiment_participations', 'public.user_experiment_variants', 'public.user_experiments', 'public.users', 'public.viewable_products', 'public.wholesalers'], denylist=['public.api_tokens', 'public.api_applications', 'public.data_migrations', 'public.nav_links', 'public.oauth_*', 'public.schema_migrations', 'public.custom_page_component_container_page_components', 'public.custom_page_component_containers', 'public.custom_page_page_components', 'public.custom_pages', 'public.cx_tasks', 'public.direct_one_time_additions_product_pages', 'public.discount_product_conditions', 'public.fosdick_jobs', 'public.geo_shipping_constraints', 'public.holiday_nav_links', 'public.holiday_shipping_cutoffs', 'public.images', 'public.ios_releases', 'public.page_components', 'public.page_module_image_attributes', 'public.page_module_sections', 'public.page_module_text_attributes', 'public.page_modules', 'public.page_sub_components', 'public.product_pages', 'public.screen_components', 'public.settings', 'public.shipping_class_shipping_types', 'public.shipping_constraints', 'public.tax_rates', 'public.url_redirects', 'public.v_operational_product_properties', 'public.versions', 'public.waitlists', 'public.pg_*'], subset='['harryswww.*']'
2023-07-18 15:21:35 - INFO - Skipping 'public.authentications' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-authentications.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.billing_profiles' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-billing_profiles.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.cancellation_survey_options' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-cancellation_survey_options.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.cancellation_survey_responses' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-cancellation_survey_responses.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.checkout_invoices' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-checkout_invoices.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.checkout_invoices_shave_plans' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-checkout_invoices_shave_plans.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.credits' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-credits.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.discount_code_batches' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-discount_code_batches.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.discount_codes' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-discount_codes.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.discount_group_items' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-discount_group_items.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.discount_groups' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-discount_groups.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.discount_orders' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-discount_orders.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.discount_product_entitlements' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-discount_product_entitlements.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.discounts' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-discounts.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.gift_notecards' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-gift_notecards.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.incentives' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-incentives.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.membership_cancellation_reasons' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-membership_cancellation_reasons.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.membership_cancellations' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-membership_cancellations.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.membership_events' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-membership_events.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.membership_programs' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-membership_programs.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.membership_retry_lifecycle_enrollments' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-membership_retry_lifecycle_enrollments.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.membership_retry_lifecycle_events' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-membership_retry_lifecycle_events.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.membership_tax_addresses' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-membership_tax_addresses.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.memberships' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-memberships.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.one_time_shave_plan_additions' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-one_time_shave_plan_additions.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.payment_provider_profiles' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-payment_provider_profiles.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.redeemed_credits' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-redeemed_credits.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.shave_plan_events' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-shave_plan_events.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.shave_plan_retry_lifecycle_enrollments' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-shave_plan_retry_lifecycle_enrollments.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.shave_plan_retry_lifecycle_events' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-shave_plan_retry_lifecycle_events.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.shave_plans' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-shave_plans.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.shipping_addresses' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-shipping_addresses.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.shipping_tiers' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-shipping_tiers.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.subscriptions' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-subscriptions.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.survey_question_choices' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-survey_question_choices.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.survey_question_responses' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-survey_question_responses.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.survey_questions' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-survey_questions.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.surveys' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-surveys.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.tos_opt_outs' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-tos_opt_outs.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.user_experiment_participations' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-user_experiment_participations.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.user_experiment_variants' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-user_experiment_variants.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.user_experiments' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-user_experiments.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.users' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-users.yaml'
2023-07-18 15:21:35 - INFO - Skipping 'public.viewable_products' from source 'harryswww' because table design already exists: 'schemas/harryswww/public-viewable_products.yaml'
2023-07-18 15:21:36 - INFO - Index 'wholesalers_pkey' of 'public.wholesalers' adds constraint {"primary_key": ["id"]}
2023-07-18 15:21:36 - INFO - Index 'index_wholesalers_on_code_prefix' of 'public.wholesalers' adds constraint {"unique": ["code_prefix"]}
2023-07-18 15:21:36 - INFO - Index 'index_wholesalers_on_name' of 'public.wholesalers' adds constraint {"unique": ["name"]}
2023-07-18 15:21:36 - INFO - Writing new table design file for 'harryswww.wholesalers' to './schemas/harryswww/public-wholesalers.yaml'
2023-07-18 15:21:36 - INFO - Done with 45 table(s) from source 'harryswww'
2023-07-18 15:21:36 - WARNING - New table(s) in source 'harryswww' without local design: 'public.wholesalers'
2023-07-18 15:21:36 - INFO - Ran 'bootstrap_sources' for 1.61s and finished successfully!