Closed epinzur closed 2 months ago
Hi; this is a tough one to debug. If you see it happen next time, can you collect information for us about open handles in the process trulens is running in? That is, the output of
lsof -p 5998
Assuming trulens is running on 5998. You can find the process id with:
ps ax | grep python
There might be many other python process running depending on what you are doing so you will have to pick out the one that you think is causing the problem.
Hi, @epinzur
I'm helping the trulens team manage their backlog and am marking this issue as stale. From what I understand, the issue involves a Python process crashing with a "Too many open files" error after saving 2426 records to the database while using TruChain for recording. Piotrm0 has suggested collecting information about open handles in the process trulens is running in to aid in debugging. However, the current status of the issue is unresolved.
Could you please confirm if this issue is still relevant to the latest version of the trulens repository? If it is, please let the trulens team know by commenting on the issue. Otherwise, feel free to close the issue yourself or the issue will be automatically closed in 7 days.
Thank you for your understanding and cooperation.
Closing this for now, please let us know if this happen again to re-open it.
When recording using TruChain, my python process crashes constantly after 2426 records have been saved to the database. It crashes with the error:
Too many open files
. I've also seen viatop
that the process has a virtual memory size in the 200+ GB range before the crash. I've encountered this crash 5+ times in the past few days.I get a HUGE stack trace after the crash occurs. See: pdf_splits.log
I'm using TruLens-eval
0.20.3
. Currently this branch from Piotr: https://github.com/truera/trulens/tree/piotrm/deferred_mem, but I've also seen the crash on a default install of0.20.3
. I haven't tried with the latest release yet.I'm recording in
deferred
mode.This is my script:
This script makes heavy use of my helper file: tru_shared
I get through almost 2 of the 5
collection_names
before the crash. I can un-block myself by running a single collection at a time.