Open michalselma opened 7 months ago
To better visualize memory leak increase you can use this code:
import whoisdomain
import gc
from memory_profiler import profile
domains = ['google.com', 'microsoft.com', 'apple.com', 'dell.com', 'hp.com', 'ab.com', 'xy.com', 'tld.com',
'samsung.com', 'ibm.com', 'lg.com', 'python.com', 'git.com', 'netflix.com', 'cisco.com', 'kfc.com',
'nasa.com', 'esa.com', 'amazon.com', 'meta.com', 'godaddy.com', 'ovh.com', 'uber.com', 'siemens.com']
def check():
for item in domains:
print(f'Checking domain: {item}')
whoisdomain_call(item)
gc.collect()
@profile
def whoisdomain_call(domain):
try:
whoisdomain.query(domain)
except whoisdomain.WhoisPrivateRegistry as exc:
return
except whoisdomain.WhoisCommandFailed as exc:
return
except whoisdomain.WhoisQuotaExceeded as exc:
return
except whoisdomain.FailedParsingWhoisOutput as exc:
return
except whoisdomain.UnknownTld as exc:
return
except whoisdomain.UnknownDateFormat as exc:
return
except whoisdomain.WhoisCommandTimeout as exc:
return
check()
Thanks I will investigate, this is very helpful
On Fri, Feb 2, 2024, 22:52 M. Selma @.***> wrote:
To better visualize memory leak increase you can use this code:
import whoisdomain import gc from memory_profiler import profile domains = ['google.com', 'microsoft.com', 'apple.com', 'dell.com', 'hp.com', 'ab.com', 'xy.com', 'tld.com', 'samsung.com', 'ibm.com', 'lg.com', 'python.com', 'git.com', 'netflix.com', 'cisco.com', 'kfc.com', 'nasa.com', 'esa.com', 'amazon.com', 'meta.com', 'godaddy.com', 'ovh.com', 'uber.com', 'siemens.com']
def check(): for item in domains: print(f'Checking domain: {item}') whoisdomain_call(item) gc.collect()
@profile def whoisdomain_call(domain): try: whoisdomain.query(domain) except whoisdomain.WhoisPrivateRegistry as exc: return except whoisdomain.WhoisCommandFailed as exc: return except whoisdomain.WhoisQuotaExceeded as exc: return except whoisdomain.FailedParsingWhoisOutput as exc: return except whoisdomain.UnknownTld as exc: return except whoisdomain.UnknownDateFormat as exc: return except whoisdomain.WhoisCommandTimeout as exc: return
check()
— Reply to this email directly, view it on GitHub https://github.com/mboot-github/WhoisDomain/issues/30#issuecomment-1924754360, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7CCKLGETAZ3B3YQLLHJUTLYRVN2PAVCNFSM6AAAAABCXKLKXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRUG42TIMZWGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
preliminary investigations show a steady increase of <class 're.Pattern'> type objects
this is most likely a side effect of the new function based regex patterns used in the TLD regex dict
i will need to investigate deeper if i can either cache them or drop them after use.
see ./tests/memtest.py and ./tests/typescript
Hi there, any updates on this issue? I recently noticed that Home Assistant is using your library (https://github.com/home-assistant/core/blob/dev/homeassistant/components/whois/manifest.json#L10) but in version 0.9.27. I'd like to refactor that integration to use the newest version because the old one isn't returning information about some of the domains I own. Sadly with this memory leak issue, I'm sure my PR to Home Assistant won't get approved.
Currently this is not a priority form as I'm low on time.
Preliminary investigations reveal no real memory leak other then increasing memory as we use previously unused tld's which is expected.
I see the whois integration uses cloud polling , looks like memory issues would not be a issue in that case. (If the who's component is not permanently loaded memory is released at the end of the program. )
On Thu, Mar 7, 2024, 13:20 Tomasz @.***> wrote:
Hi there, any updates on this issue? I recently noticed that Home Assistant is using your library ( https://github.com/home-assistant/core/blob/dev/homeassistant/components/whois/manifest.json#L10) but in version 0.9.27. I'd like to refactor that integration to use the newest version because the old one isn't returning information about some of the domains I own. Sadly with this memory leak issue, I'm sure my PR to Home Assistant won't get approved.
— Reply to this email directly, view it on GitHub https://github.com/mboot-github/WhoisDomain/issues/30#issuecomment-1983395185, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7CCKLEYC4LURJSXHP4JEDLYXBLR5AVCNFSM6AAAAABCXKLKXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBTGM4TKMJYGU . You are receiving this because you commented.Message ID: @.***>
@mboot-github thank you for the reply. cloud_polling means the integration requires internet access (Home Assistant can work 100% offline).
The library is loaded into memory (https://github.com/home-assistant/core/blob/dev/homeassistant/components/whois/__init__.py#L26) and constantly used, the query is done once every 24 hours (https://github.com/home-assistant/core/blob/dev/homeassistant/components/whois/__init__.py#L38, https://github.com/home-assistant/core/blob/dev/homeassistant/components/whois/const.py#L15)
I'll try to update the integration to the newest version and well see if my domains will return the correct info, sadly right now (old version of the library) I get no info about .pl
domains
a experimental fix is available in https://github.com/mboot-github/WhoisDomain/blob/master/testProc.py
It runs the whoisdomain.q2(domain=domain, pc=pc) in a different process and restarts that process after a specified N calls
so far all i can see that the "memory leak" is caused by the default caching of any new tls queried.
Describe the bug
Running thousands of whoisdomain.guery() calls under multiprocesses or multithreads. After few hours I noticed OS mem consumption increase from 4-5GB to 30-35GB. After digging into my code and setting up more strict garbage collection came to conclusion that whoisdomain might be area of leak. Running under Win with default sysinternals whois.exe
To Reproduce
Python code:
Outputs