materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.51k stars 864 forks source link

Memory leak with `Chgcar.from_dict` #4146

Closed aryannsaha closed 3 days ago

aryannsaha commented 3 days ago

Python version

Python 3.12.4

Pymatgen version

2024.10.27

Operating system version

Linux

Current behavior

I'm working with Professor Andrew Rosen @Andrew-S-Rosen. We found what appears to be a memory leak associated with Chgcar.from_dict. Even when deleting the object from memory we still witness an increase in memory.

Expected Behavior

We expected to see no overall net increase in memory.

Minimal example

import gzip
import json
import tracemalloc

import psutil
from pymatgen.io.vasp.outputs import Chgcar

# memory tracking functions and start
def get_memory_usage():
    """Get current process memory usage in MB"""
    process = psutil.Process()
    return process.memory_info().rss / (1024 * 1024)

def log_memory(message, log_file="memory_usage_testRead.log"):
    """Log memory usage to file and print to console"""
    mem_usage = get_memory_usage()
    log_message = f"{message}: {mem_usage:.2f} MB\n"

    # Write to file
    with open(log_file, "a") as f:
        f.write(log_message)

    # Also print to console
    print(log_message, end="")

# Start memory tracking
tracemalloc.start()
print(f"Initial memory usage: {get_memory_usage():.2f} MB")

# Read in a CHGCAR several times
for i in range(10):
    log_memory(f"Before gzip CHGCAR {i}")
    with gzip.open("mp-1000005.json.gz", "rt", encoding="utf-8") as g:
        chgcar_json = json.load(g)
    log_memory(f"After gzip CHGCAR {i}")

    log_memory(f"Before loading CHGCAR {i}")
    chgcar = Chgcar.from_dict(chgcar_json["data"])
    log_memory(f"After loading CHGCAR {i}")

    del chgcar
    log_memory(f"After deleting CHGCAR {i}")

print(f"Final memory usage: {get_memory_usage():.2f} MB")

Relevant files to reproduce this bug

mp-1000005.json.gz

Output

Initial memory usage: 161.57 MB
Before gzip CHGCAR 0: 161.57 MB
After gzip CHGCAR 0: 243.25 MB
Before loading CHGCAR 0: 243.25 MB
After loading CHGCAR 0: 250.32 MB
After deleting CHGCAR 0: 250.32 MB
Before gzip CHGCAR 1: 250.32 MB
After gzip CHGCAR 1: 359.54 MB
Before loading CHGCAR 1: 359.54 MB
After loading CHGCAR 1: 359.64 MB
After deleting CHGCAR 1: 359.64 MB
Before gzip CHGCAR 2: 359.64 MB
After gzip CHGCAR 2: 367.79 MB
Before loading CHGCAR 2: 367.79 MB
After loading CHGCAR 2: 367.89 MB
After deleting CHGCAR 2: 367.89 MB
Before gzip CHGCAR 3: 367.89 MB
After gzip CHGCAR 3: 320.54 MB
Before loading CHGCAR 3: 320.54 MB
After loading CHGCAR 3: 320.54 MB
After deleting CHGCAR 3: 320.54 MB
Before gzip CHGCAR 4: 320.54 MB
After gzip CHGCAR 4: 320.54 MB
Before loading CHGCAR 4: 320.54 MB
After loading CHGCAR 4: 320.54 MB
After deleting CHGCAR 4: 320.54 MB
Before gzip CHGCAR 5: 320.54 MB
After gzip CHGCAR 5: 324.48 MB
Before loading CHGCAR 5: 324.48 MB
After loading CHGCAR 5: 324.48 MB
After deleting CHGCAR 5: 324.48 MB
Before gzip CHGCAR 6: 324.48 MB
After gzip CHGCAR 6: 373.71 MB
Before loading CHGCAR 6: 373.71 MB
After loading CHGCAR 6: 373.71 MB
After deleting CHGCAR 6: 373.71 MB
Before gzip CHGCAR 7: 373.71 MB
After gzip CHGCAR 7: 324.38 MB
Before loading CHGCAR 7: 324.38 MB
After loading CHGCAR 7: 324.38 MB
After deleting CHGCAR 7: 324.38 MB
Before gzip CHGCAR 8: 324.38 MB
After gzip CHGCAR 8: 322.42 MB
Before loading CHGCAR 8: 322.42 MB
After loading CHGCAR 8: 322.42 MB
After deleting CHGCAR 8: 322.42 MB
Before gzip CHGCAR 9: 322.42 MB
After gzip CHGCAR 9: 324.38 MB
Before loading CHGCAR 9: 324.38 MB
After loading CHGCAR 9: 324.38 MB
After deleting CHGCAR 9: 324.38 MB
Final memory usage: 324.38 MB

This was for a different CHGCAR file, but the same behavior is observed for many CHGCARs we tried. In production calculations, the memory usage continually increased, although here it tends to fluctuate after a notable initial increase.

Andrew-S-Rosen commented 3 days ago

Here is a test with memory_profiler:

from memory_profiler import profile
from monty.serialization import loadfn

@profile
def test():
    for _ in range(100):
        loadfn("mp-1000005.json.gz")

if __name__ == "__main__":
    test()

Doing one iteration:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     5    201.2 MiB    201.2 MiB           1   @profile
     6                                         def test():
     7    333.6 MiB      0.0 MiB           2       for _ in range(1):
     8    333.6 MiB    132.4 MiB           1           loadfn("mp-1000005.json.gz")

Doing 100 iterations:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     5    201.2 MiB    201.2 MiB           1   @profile
     6                                         def test():
     7    450.3 MiB   -608.8 MiB         101       for _ in range(100):
     8    450.3 MiB   -359.7 MiB         100           loadfn("mp-1000005.json.gz")
shyuep commented 3 days ago

Chgcar inherits Volumetric data and that inherits from MSONable.from_dict. If there is a leak, it is in MSONable.

Andrew-S-Rosen commented 3 days ago

Yup, I am looking into it. I also am not yet convinced this is a "bug" or just a quirk of Python. I'm going to close this issue and see if I can reproduce the behavior with the MSONable class.