Some images, for example US20230354702A1-20231102-C00260.TIF from USTPO grant red book (attached) makes MolScribe to hang for hours and use an unreasonable amount of RAM.
[('L', 202)] 2020202020201
L 20202020202020201
L 2020202020201
L 20202020202020201
for this image. That means two trillions of iterations (attaching stuff to a list) in some cases that makes mass processing of images hang. Also using an unreasonable amount of memory.
The fix is extremelly simple skipping the processing of elements with more than 100000 atoms.
Some images, for example
US20230354702A1-20231102-C00260.TIF
from USTPO grant red book (attached) makes MolScribe to hang for hours and use an unreasonable amount of RAM.https://github.com/thomas0809/MolScribe/blob/97acee57d10bd719f4dc1cfd30d09f142b7dc65f/molscribe/chemistry.py#L200
shows:
[('L', 202)] 2020202020201 L 20202020202020201 L 2020202020201 L 20202020202020201
for this image. That means two trillions of iterations (attaching stuff to a list) in some cases that makes mass processing of images hang. Also using an unreasonable amount of memory.
The fix is extremelly simple skipping the processing of elements with more than 100000 atoms.
US20230354702A1-20231102-C00260.TIF.zip