googlegenomics / gcp-variant-transforms

GCP Variant Transforms
Apache License 2.0
134 stars 55 forks source link

pyYAML 6.0 incompatability #704

Closed james-lawlor closed 2 years ago

james-lawlor commented 2 years ago

When installed as a package from pip, the current requirements will allow installation of pyYAML version 6.0, which results in this error:

vcf_to_bq.py --setup_file gcp-variant-transforms/setup.py --project variant-database-297614 --allow_malformed_records --region us-west1 --temp_location gs://vcfs/tmp --input_pattern gs://vcfs/HALB3003250.vcf.gz --output_table variant-database-297614:v3.variants --job_name vcf1635366194 --runner DataflowRunner --variant_merge_strategy MOVE_TO_CALLS --copy_quality_to_calls --copy_filter_to_calls --include_call_name --max_num_workers 25 --append --sharding_config_path /homo_sapiens_default.yaml --update_schema_on_append
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
WARNING:root:Since tables were appended, added rows cannot be reverted. You can utilize BigQuery snapshot decorators to recover your table up to 7 days ago. For more information please refer to: https://cloud.google.com/bigquery/table-decorators Here is the list of tables that you need to manually rollback:
Traceback (most recent call last):
  File "/python-3.7.6/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/cluster/software/python-3.7.6/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/bigquery_virtualenv/lib/python3.7/site-packages/gcp_variant_transforms/vcf_to_bq.py", line 651, in <module>
    raise e
  File "//bigquery_virtualenv/lib/python3.7/site-packages/gcp_variant_transforms/vcf_to_bq.py", line 639, in <module>
    run()
  File "//bigquery_virtualenv/lib/python3.7/site-packages/gcp_variant_transforms/vcf_to_bq.py", line 443, in run
    _COMMAND_LINE_OPTIONS)
  File "/bigquery_virtualenv/lib/python3.7/site-packages/gcp_variant_transforms/pipeline_common.py", line 76, in parse_args
    transform_options.validate(known_args)
  File "/bigquery_virtualenv/lib/python3.7/site-packages/gcp_variant_transforms/options/variant_transform_options.py", line 248, in validate
    parsed_args.sharding_config_path, parsed_args.append, True)
  File "/bigquery_virtualenv/lib/python3.7/site-packages/gcp_variant_transforms/options/variant_transform_options.py", line 276, in _validate_output_tables
    sharding = variant_sharding.VariantSharding(sharding_config_path)
  File "bigquery_virtualenv/lib/python3.7/site-packages/gcp_variant_transforms/libs/variant_sharding.py", line 102, in __init__
    config_file_path)
  File "gcp_variant_transforms/libs/variant_sharding.py", line 140, in _validate_config_and_check_intervals
    shards = yaml.load(f)
TypeError: load() missing 1 required positional argument: 'Loader'

Reverting to pyYAML 5.4.1 (pip install --force-reinstall pyyaml==5.4.1) appears to address this behavior.

lawrenae commented 2 years ago

@james-lawlor I think #707 will fix this, coincidently. It does explicitly specify pyyaml==5.4.1 in a new requirements.txt file that you can use to install dependencies in your local setup (pip install -r requirements.txt).

I'm closing this issue in anticipation of success, but do reopen if this doesn't fix your issue.