statisticalbiotechnology / quandenser

QUANtification by Distillation for ENhanced Signals with Error Regulation
Apache License 2.0
9 stars 1 forks source link

Insufficient memory #16

Closed andrewjmc closed 4 years ago

andrewjmc commented 4 years ago

Hello,

For the second time running my large dataset I have had an out-of-memory error from dinosaur in the targeted mode in the later stages of the linking (>80% through of the minimum spanning tree).

I have 64 Gb RAM with 5Gb used at rest. On this rerun I reduced max memory to 40 Gb from 48 Gb thinking this would protect from the crash.

java -Xmx40G -jar "C:\Program Files\quandenser-v0-02\share\java/Dinosaur-1.1.3.free.jar" --force  --profiling=true --nReport=0  --concurrency=11 --seed=1 --outDir=.\quandenser_output_only_samples_noOX1\percolator/search_and_link_211_to_222.psms.pout_dinosaur --advParams="C:\Program Files\quandenser-v0-02\share\java/advParams_dinosaur_targeted.txt" --mode=target --targets=.\quandenser_output_only_samples_noOX1\percolator/search_and_link_211_to_222.psms.pout.dinosaur_targets.tsv E:\RAW\Lab2\Pilot_2\FL948_MSQ1388_20180605_SM_81.mzML
Dinosaur 1.1.3    built:${maven.build.timestamp}
  mzML file: E:\RAW\Lab_2\Pilot_2\FL948_MSQ1388_20180605_SM_81.mzML
    out dir: .\quandenser_output_only_samples_noOX1\percolator/search_and_link_211_to_222.psms.pout_dinosaur
   out name: FL948_MSQ1388_20180605_SM_81

.                              .
[==============================]all hills, n=655841
hill checkSum = 168119716302633
peaky hills, n=655841
peaky hill checkSum = 168119716302633
  nScans    nHills
         2        0
         3   153226
         4    89346
      5-10   207507
     10-20   141032
     20-50    57968
    50-100     5622
   100-200     1140
   200-500        0
  500-1000        0
 1000-2000        0
 2000-5000        0
5000-10000        0
    >10000        0
writing hill reports...
hill reports written
=== IN TARGETED MODE ===
deisotoping complete
isotopes, n=421682
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 742896 bytes for Chunk::new
# An error report file with more information is saved as:
# E:\RAW\hs_err_pid15820.log
[thread 21152 also had an error]
#
# Compiler replay data is saved as:
# E:\RAW\replay_pid15820.log

The attached log files may help, though I realise this seems to be a dinosaur error. Is it significant this is happening in the later (and more time-consuming) pairs of files? These ones were taking up to an hour.

I will resume the run, and remove the truncated outputs from this matching. However, I don't think this should be happening given the 40 Gb limit imposed on dinosaur (unless Quandenser itself is building a large in-memory store during the linking process).

Do you have any suggestions?

Thanks for your help,

Andrew

replay_pid15820.log hs_err_pid15820.log

MatthewThe commented 4 years ago

Quandenser is storing some data in memory, though as far as I've seen it usually is in the range of 2-3GB towards the end of the alignment tree. However, I have never run it with 200+ files, so it might be significantly more in your case. There is already code in place in the quandenser-pipeline that saves this information that's currently kept in memory to disk instead, so it would not be too hard to apply the same trick in "vanilla" quandenser.

You could try reducing the memory allocated to Dinosaur even more. It should run even with something like 16GB, though it will take a bit more time because of garbage collection.

andrewjmc commented 4 years ago

I guess you could either save it to disk, or treat the -M parameter to quandenser as a maximum memory usage. That way, the memory allowed to dinosaur could be whatever is left out of the total. Whichever would be more efficient.

Yesterday's rerun seems to have frozen the machine and I can't login at present! Assuming I need to rerun again, I will try 16 Gb.

andrewjmc commented 4 years ago

Afraid the rerun even with 16Gb for dinosaur still failed.

java -Xmx16G -jar "C:\Program Files\quandenser-v0-02\share\java/Dinosaur-1.1.3.free.jar" --force  --profiling=true --nReport=0  --concurrency=11 --seed=1 --outDir=.\quandenser_output_only_samples_noOX1\percolator/search_and_link_123_to_131.psms.pout_dinosaur --advParams="C:\Program Files\quandenser-v0-02\share\java/advParams_dinosaur_targeted.txt" --mode=target --targets=.\quandenser_output_only_samples_noOX1\percolator/search_and_link_123_to_131.psms.pout.dinosaur_targets.tsv E:\RAW\Lab2\Pilot_2\FL948_MSQ1388_20180605_SM_171.mzML
Dinosaur 1.1.3    built:${maven.build.timestamp}
  mzML file: E:\RAW\Lab2\Pilot_2\FL948_MSQ1388_20180605_SM_171.mzML
    out dir: .\quandenser_output_only_samples_noOX1\percolator/search_and_link_123_to_131.psms.pout_dinosaur
   out name: FL948_MSQ1388_20180605_SM_171

#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 4088 bytes for AllocateHeap
# An error report file with more information is saved as:
# E:\RAW\hs_err_pid6500.log

I will rerun with 8 Gb. I think there are only another ~10 alignments to go.

However, it supports the need for a version which can keep the alignment tree on disk. I presume that my desire to see all features (no max missing), may be partly responsible for the excessive memory consumption?

Thanks,

Andrew

andrewjmc commented 4 years ago

And just watching the memory use with the rerun, reading in all the dinosaur features and deserialising spectrum to precursor map already uses 6.9 Gb before computing alignments and MST.

MatthewThe commented 4 years ago

Thanks for the information, 6.9 Gb is more than I anticipated. I will add an option to write the intermediate feature lists to file to save memory.

However, I'm not entirely sure if you could still use all the results you have obtained so far, as I remember that I had to do some re-indexing of features to make it work. In the worst case, you could run the Dinosaur targeted runs separate from Quandenser, by simply running the Dinosaur command from the log file and let Quandenser read in the output file and (hopefully) continue on its way.

andrewjmc commented 4 years ago

OK, not sure exactly how I would do this - if necessary will need to clarify.

In the meantime, I will hope that the extra 8Gb saved solves the problem!

Updated to add that half way through loading links (242/482) we are at 14.8 Gb, not too bad. But given the nature of the order, I suspect the RAM usage will increase non-linearly.

andrewjmc commented 4 years ago

We are now at 59 Gb by 329 links of 482 (loading from disk). I suspect this will not work regardless of how little RAM I give dinosaur!

While awaiting another solution, I'll see if anyone has a better specced machine I can shift to.

MatthewThe commented 4 years ago

Ah, that's not going to work indeed. Unfortunately, this probably also means that we'll run out of memory even if we write the intermediate feature files to disk, as Quandenser loads all of these in after all alignments are done to do the actual feature clustering. I need to think a bit about how we could work around this..

To clarify my earlier suggestion, you could run the Dinosaur command outside of Quandenser (i.e. run this on the command line java -Xmx16G -jar "C:\Program Files\quandenser-v0-02\share\java/Dinosaur-1.1.3.free.jar" --force --profiling=true --nReport=0 --concurrency=11 --seed=1 --outDir=.\quandenser_output_only_samples_noOX1\percolator/search_and_link_123_to_131.psms.pout_dinosaur --advParams="C:\Program Files\quandenser-v0-02\share\java/advParams_dinosaur_targeted.txt" --mode=target --targets=.\quandenser_output_only_samples_noOX1\percolator/search_and_link_123_to_131.psms.pout.dinosaur_targets.tsv E:\RAW\Lab2\Pilot_2\FL948_MSQ1388_20180605_SM_171.mzML). This should produce a Dinosaur output file, that when you run Quandenser again will (hopefully) allow you to move onto the next alignment. This will probably allow you to go until the next Dinosaur run, where you will have to do the same. This is obviously not ideal and might still crash due to some other part running out of memory.

andrewjmc commented 4 years ago

I presume this will only work with the single failed dinosaur run, and not the further ones (which I expect will depend on the output of the corresponding percolator run, no?).

MatthewThe commented 4 years ago

Quandenser should be able to read the dinosaur output file (provided it's in the right location) and subsequently run the corresponding percolator run, which will make this available for any potential further runs that depend on it.

andrewjmc commented 4 years ago

Apologies for posting essentially a running commentary here, but interestingly after reading in all the alignments and reaching 99% of RAM (60Gb by quandenser), we're now in the targeted dinosaur-percolator steps (hopefully only 6 to go) with only 12 Gb of RAM used. Unexpected, but good.

andrewjmc commented 4 years ago

The run has now completed. However, it took a large number of reruns to get it completed, so there's definitely value in making quandenser store more on the disk.

There are some issues with the output, which I will raise in a separate issue.

Thanks again,

Andrew