iMetOsaka / UNAGI

3 stars 4 forks source link

MemoryError, any suggestions? #8

Open a-diamant opened 2 years ago

a-diamant commented 2 years ago

Hello! Do you have any ideas where this error may come from? I tried to run UNAGI several times but it's crashing on "Generating the genome coverage for each position" step. What is the usual jobtime for UNAGI? I'm running it on six fastq files (from 10 to 16 Gb each)? Thank you for your suggestions!

P.S. I'm using university cluster, the technical information is here: https://calculs.univ-cotedazur.fr/?page_id=450&lang=en

[2022/02/22 - 05:01:37] Generating the genome coverage for each position
Traceback (most recent call last):
  File "/home/adiamant/unagi/app/unagi.py", line 812, in <module>
    main(sys.argv[1:])
  File "/home/adiamant/unagi/app/unagi.py", line 150, in main
    combineCoverage(os.path.join(transitionnalOutputPath,config["positive_coverage_file"]),os.path.join(transitionnalOutputPath,config["negative_coverage_file"]),os.path.join(transitionnalOutputPath,config["total_coverage_file"]))
  File "/home/adiamant/unagi/app/unagi.py", line 704, in combineCoverage
    totalCoverage[posparts[0]][posparts[1]] = int(posparts[2])
MemoryError
JungNicolas commented 2 years ago

Hello Anna. As you may have guessed, the error you are receiving points to your server running out of usable memory when running UNAGI. It seems like the problem comes from the file size when combining the genome coverage of positively and negatively stranded reads.

A quick fix you can try to reduce that size is to change the genomecov options in the conf.ini file. Look for the following line:

genomecov_options=genomecov -d

And change it to:

genomecov_options=genomecov -dz

This will ignore the zero-coverage zones and reduce the memory consumption, especially if you're using a sparser genome like human (UNAGI was initially created for yeast cDNA reads) Let me know if you are still experiencing problems after that change.

We internally ran UNAGI on smaller fastq files, so our run times might differ from yours, but with files averaging 1Gb in size, the run time was about 5 minutes.

Please let us know how it went. (Also: Bonjour depuis l'autre bout du monde !)

a-diamant commented 2 years ago

Hello! Thank you for your answer. Yes, the problem comes indeed from the custom python fuction that combines two coverage files. I found a solution here: https://github.com/mglubber/UNAGI/blob/refactor/app/unagi.py and replaced your original combineCoverage function with the version from the link above. P.S. Bonjour! Merci pour votre aide! En fait, c'est plutot "Glory to Ukraine!" pour moi.