mohaseeb / beam-nuggets

Collection of transforms for the Apache beam python SDK.
http://mohaseeb.com/beam-nuggets/
MIT License
87 stars 38 forks source link

Issues with Dataflow #35

Open JoshBello opened 3 years ago

JoshBello commented 3 years ago

Works when using the interactive runner but not when using dataflow.

Using the sample code.

Any ideas?

luisarboleda17 commented 3 years ago

Hi @JoshBello ,

I also had that issue. What worked for me was to set my project as a package creating the setup.py file and passing the setup argument.

setup.py

import setuptools

PACKAGE_NAME = '<Package>'
PACKAGE_VERSION = '0.0.1'
REQUIRED_PACKAGES = [
    'beam_nuggets',
]

setuptools.setup(
    name=PACKAGE_NAME,
    version=PACKAGE_VERSION,
    install_requires=REQUIRED_PACKAGES,
    packages=setuptools.find_packages(),
    py_modules=['<Package>']
)

And then, passing the --setup argument.

python main.py \
...
  --setup=./setup.py \
...
mohaseeb commented 3 years ago

Hi @JoshBello and @luisarboleda17, What's the error you are seeing?

roodnejm commented 3 years ago

@mohaseeb Hi, I have a similar issue with Dataflow, when i create a setup.py with beam-nuggets, I have the following error:

File "/usr/local/lib/python3.8/site-packages/sqlalchemy/dialects/postgresql/pg8000.py", line 395, in init raise NotImplementedError("pg8000 1.16.6 or greater is required") NotImplementedError: pg8000 1.16.6 or greater is required [while running 'write to postgres/ParDo(_WriteToRelationalDBFn)']

I can see in the setup.py of beam-nuggets the mandatory version of pg8000 is 1.16.5 or less and SqlAlchemy 1.4.0 or more but for SqlAlchemy 1.4.0 and above, I think the pg8000 version required is 1.16.6. https://github.com/sqlalchemy/sqlalchemy/blob/94169108cdd4dace09b752a6af4f4404819b49a3/lib/sqlalchemy/dialects/postgresql/pg8000.py#L395

So I am not sure I can please apache-nuggets and sqlalchemy with the right version of pg8000.

Thank you