kmadathil / sanskrit_parser

Parsers for Sanskrit / संस्कृतम्
MIT License
68 stars 21 forks source link

Reducing memory usage #151

Closed avinashvarna closed 3 years ago

avinashvarna commented 3 years ago

I've been trying to set up the API server on App Engine at https://sanskrit-parser.appspot.com. The basic install worked fine, but even a simple query such as /sanskrit_parser/v1/tags/hares exceeds the memory limit and is killed. I've tried upping the instance to the largest supported with 2G mem, with the same result. @kmadathil which EC2 instance did you use to deploy the API ?

I haven't been able to reproduce the >2G mem usage locally, but it appears that just starting up the server consumes about 800MB. Doing some quick memory profiling, loading the INRIA database consumes around ~300-400 MB, and all the sandhi rules dicts take up another ~300 MB. I wonder if such memory usage could become an issue in, say, a mobile app that is intended to work offline, but it definitely seems to be impeding deployment on App engine.

I am going to look into ways to reduce the memory usage, e.g. by converting the INRIA forms/sandhi rules storage into an actual database, and examine if it impacts performance.

Any other ideas? Not sure how much time we should devote to this if no one has reported an issue so far.

avinashvarna commented 3 years ago

~@kmadathil which EC2 instance did you use to deploy the API ?~

Just noticed that README_ec2 says 1G + 2G swap per worker.

kmadathil commented 3 years ago

I had increased swap (by creating a swapfile) on the EC2 instance I used till it worked. I'll have to see exactly which one.

On Sun, Jan 10, 2021, 10:23 AM Avinash Varna notifications@github.com wrote:

I've been trying to set up the API server on App Engine at https://sanskrit-parser.appspot.com. The basic install worked fine, but even a simple query such as /sanskrit_parser/v1/tags/hares exceeds the memory limit and is killed. I've tried upping the instance to the largest supported with 2G mem, with the same result. @kmadathil https://github.com/kmadathil which EC2 instance did you use to deploy the API ?

I haven't been able to reproduce the >2G mem usage locally, but it appears that just starting up the server consumes about 800MB. Doing some quick memory profiling, loading the INRIA database consumes around ~300-400 MB, and all the sandhi rules dicts take up another ~300 MB. I wonder if such memory usage could become an issue in, say, a mobile app that is intended to work offline, but it definitely seems to be impeding deployment on App engine.

I am going to look into ways to reduce the memory usage, e.g. by converting the INRIA forms/sandhi rules storage into an actual database, and examine if it impacts performance.

Any other ideas? Not sure how much time we should devote to this if no one has reported an issue so far.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kmadathil/sanskrit_parser/issues/151, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKEWNT7IB7OUBW2JEN4MFTSZHWBHANCNFSM4V4TV2MQ .

avinashvarna commented 3 years ago

For the sandhi rules, loading only the rules needed (forward/backward) seems to be working reasonably for now (it halves the memory usage at least). Further optimizations could be implemented in the future, if necessary.

To reduce the memory used by the INRIA db, I've tried implementing a sqlite database (so that all the forms don't need to be loaded into memory) in the fix_148 branch, and documented it.

I will close this issue for now, and we can track the overall branch status in #148