open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.26k stars 994 forks source link

Metadata Memory Exhaustion Persists Even After Fix (Re-opening #13660) #14043

Closed Jason-Clark-FG closed 9 months ago

Jason-Clark-FG commented 10 months ago

Affected module Metadata Backup

This is to re-open this issue: #13660

@pmbrull Thanks for the fix, I just cleared out and installed the latest python modules under the user and ran the backup. It still runs out of memory and gets killed by the oom-kill process. It takes a little longer to consume the memory, but it still does.

pip freeze --user > uninstall_list.txt;pip uninstall -y -r uninstall_list.txt
pip --disable-pip-version-check install --user --no-warn-script-location --upgrade openmetadata-ingestion[backup,mysql]~=1.2.1

Python Modules:

$ pip freeze --user
antlr4-python3-runtime==4.9.2
appdirs==1.4.4
avro==1.11.3
azure-core==1.29.1
azure-identity==1.15.0
azure-storage-blob==12.19.0
beautifulsoup4==4.12.2
boto3==1.29.1
botocore==1.32.1
cached-property==1.5.2
cachetools==5.3.2
collate-sqllineage==1.1.5
croniter==1.3.15
diff_cover==8.0.1
ecdsa==0.18.0
email-validator==2.1.0.post1
exceptiongroup==1.1.3
google==3.0.0
google-auth==2.23.4
greenlet==3.0.1
grpcio==1.59.2
grpcio-tools==1.59.2
idna==2.10
importlib-metadata==6.8.0
iniconfig==2.0.0
isodate==0.6.1
jmespath==1.0.1
memory-profiler==0.61.0
msal==1.25.0
msal-extensions==1.0.0
mypy-extensions==1.0.0
networkx==3.2.1
openmetadata-ingestion==1.2.1.1
packaging==23.2
pathspec==0.11.2
pluggy==1.3.0
portalocker==2.8.2
protobuf==4.25.1
psutil==5.9.6
pydantic==1.10.13
PyMySQL==1.1.0
pytest==7.4.3
python-dateutil==2.8.2
python-jose==3.3.0
PyYAML==6.0.1
regex==2023.10.3
requests-aws4auth==1.2.3
rsa==4.9
s3transfer==0.7.0
soupsieve==2.5
SQLAlchemy==1.4.50
sqlfluff==2.1.4
sqlparse==0.4.3
tabulate==0.9.0
tblib==3.0.0
toml==0.10.2
tomli==2.0.1
tqdm==4.66.1
typing-compat==0.1.0
typing-inspect==0.9.0
typing_extensions==4.5.0

I see the number of rows read are being limited, do we need to clear them out of memory before the next read or something?

TIA

pmbrull commented 9 months ago

hi @Jason-Clark-FG thanks for reaching back. I misread the behavior of some functions which we'll address. Validated the memory profile on the new methods and it should be way lower now. Thanks

Jason-Clark-FG commented 9 months ago

Thanks @pmbrull, looking forward to being able to try this fix! I'll have to modify our backup automation, but that will be a welcome change if it all works! 😄