Closed relyt0925 closed 33 minutes ago
Hmm - a hang is interesting. I suspect you're on a machine that doesn't have a working tesseract install (or at least the command environment ilab is running in doesn't have a working one) and it's falling back to EasyOCR in resolve_ocr_options
. EasyOCR will attempt to download model files from their GitHub releases at this point. Do these machines have limited networking connectivity? Perhaps the hang is a firewall or something else hanging EasyOCR's attempts to download its model weights?
The machines do have limited outbound network connectivity although access to GitHub (over port 443 https connections) are allowed (not ssh based connections)
I would think ultimately it would have failed if it was a network failure and not hanged but I am not totally sure what the exact line of code within the function the program was hanging
And you're certain the process is hung, right? Because when trying to reproduce this on some test machines here, the process actually died with an OSError
about a missing system library instead of hanging. In your setup, you'd be able to detect the process crashing differently from hanging?
Yes: my environment where I got it to hang and the validated this fixed I actually did a custom patch of rhel ai 1.2 because I also saw that library issue on rhel ai 1.3
the patch installed .21 for Instructlab with pip and then in the test that was successful added in my sdg patch on top of rhel ai 1.2
Ahh, ok - if your system environment was based off of RHEL AI 1.2, then it would make sense that it's falling back to EasyOCR because Tesseract wouldn't be installed and setup to work properly on that system (unless you installed those packages and setup things like TESSDATA_PREFIX
yourself). The actual hang is still interesting, but will likely not be something hit in a RHEL AI 1.3 environment.
Aha!!!! sounds great thank you!
It was invalid of me to patch on RHEL AI 1.2 to try and bring in instructlab .21: and therefore this hang is expected
I have not been able to get a lower level debug log: but when trying to run sdg on a sample PDF document on RHEL AI: this function will hang indefinitely:
Steps to reproduce: 1) Get on rhel ai and run ilab data generate on a pdf taxonomy. The example I used is here: https://github.com/relyt0925/taxonomy-doclingpoc/tree/main
2) Look at logs: when resolve_ocr_options is ran the process will hang indefinitely at
I built a custom image commenting out that section with a custom SDG patch: https://github.com/relyt0925/sdg/commit/08343204e6fda0ae5473f9e99a8b77271ca77bde and then reran it and we are able to get to the point of processing documents
my custom test image is quay.io/relyt09250/testinstructlabbuilds:121withsdgpatch