interactivereport / cellxgene_VIP

Enables cellxgene to generate violin, stacked violin, stacked bar, heatmap, volcano, embedding, dot, track, density, 2D density, sankey and dual-gene plot in high-resolution SVG/PNG format. It also performs differential gene expression analysis and provides a Command Line Interface (CLI) for advanced users to perform analysis using python and R.
https://cellxgenevip-ms.bxgenomics.com
MIT License
129 stars 44 forks source link

Memory leak issue #66

Closed michaeleekk closed 2 years ago

michaeleekk commented 2 years ago

Issue

It might not be exactly be the memory leak but I thought this would be easiest to be understood. The issue is that when I ran DEG a few times against a dataset with a significant size, around 3GB to 8GB, it will consume a lot of memory and the program crashed because of out of memory issue. At the createData() and DEG(), it consumed memory significantly, which should be obvious to the writer I believe, when I profiled with memory-profiler.

I wonder if this issue would be fixed or not ?

The workaround I did right now is to modify the route() to fork a sub process to run the DEG() function and return only the result CSV back to the parent process. This way, the memory usage will be constant on the parent process when I tested with the following test case.

The test case's code

I uploaded the modified VIPInterface.py to pastebin, https://pastebin.pl/view/941d8988. The main changes are adding @profile on createData() and DEG() and to remove the dependency on the Flask app so it could be a self-contained testable code.

The test case body is as followed,

from server.common.config.app_config import AppConfig
from VIPInterface2 import route

@profile
def main():
    app_config = AppConfig()
    app_config.update_server_config(
        single_dataset__datapath='/datasets/datafile.h5ad',
    )
    # import pdb; pdb.set_trace()
    data = open('/datasets/data.json', 'rb').read()
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    res = route(data, app_config)
    # print(res)
    del res

if __name__ == "__main__":
    main()

The data.json is dataset dependent so I didn't include it in this issue, but it should be able to be captured from the browser.

Workaround code

This is the workaround that I did on route(),


from multiprocessing import Process, Queue

def runJob(q, data, appConfig):
  data = initialization(data,appConfig)
  try:
      taskRes = distributeTask(data["method"])(data)
      q.put([False, taskRes])
  except Exception as e:
    q.put([True, traceback.format_exc()])

def route(data,appConfig):
  q = Queue()
  p = Process(target=runJob, args=(q, data, appConfig,))
  try:
    getLock(jobLock)
    p.start()
    has_error, data = q.get()
    p.join()
    freeLock(jobLock)
    if has_error:
        return 'ERROR @server: '+data
    else:
        return data
  except Exception as e:
    freeLock(jobLock)
    return 'ERROR @server: '+traceback.format_exc() # 'ERROR @server: {}, {}'.format(type(e),str(e))
  #return distributeTask(data["method"])(data)
z5ouyang commented 2 years ago

if you use "Welch's t-test (cellxgene)", will this memory issue happens?

michaeleekk commented 2 years ago

@z5ouyang I forgot to mention which one I used. I used "Welch's t-test (diffxpy)" when I created this issue. The memory usage is like this,

image

I just tried with "Welch's t-test (cellxgene)", the memory usage is like this,

image

So "Welch's t-test (cellxgene)" didn't have the issue but only "Welch's t-test (diffxpy)".

z5ouyang commented 2 years ago

@michaeleekk sorry for the delay, many projects. Do you mind to set up a pull request? thanks!

michaeleekk commented 2 years ago

@z5ouyang I created #72 for this issue. Hope it helps !