statisticalbiotechnology / quandenser

QUANtification by Distillation for ENhanced Signals with Error Regulation
Apache License 2.0
9 stars 1 forks source link

quadenser crashing during maraclust #21

Open tokeru opened 2 years ago

tokeru commented 2 years ago

crash2 crash1

Hi there, I really liked your paper and was curious to test it with one of my datasets. Apparently the quandenser crashes while doing maraclust. As there is no log file or error report or something like this, I have no clue where to start trouble shooting. As you can see in the screenshots I tried it more than 1 time to see if it always stops at the same point/file to evaluate if a file is corrupted, but it doesn't look like this. Do you have any guess on whats going wrong here? I executed quandenser with the command C:\Program Files\quandenser-v0-03\bin\quandenser --batch quandenser_list_file.txt --max-missing 1 --dinosaur-memory 64G -N 32

Thanks and best regards Alex

MatthewThe commented 2 years ago

Hi Alex,

Thanks for reporting this issue!

I haven't seen problems at that stage of MaRaCluster before, could you try it with less cores (4, or if that doesn't work even just 1)? It could be that the operating system doesn't like that so many large files are open at the same time.

Best, Matthew

tokeru commented 2 years ago

I tried it now with 4 and 1 cores. Also reduced and increased the amount of memory and also removed these parameters completly so that default values are used. I as well changed the number of missing values. Nothing has changed the crash.

MatthewThe commented 2 years ago

Okay, too bad. Thanks for trying :)

Would it perhaps be possible for you to share 2 of your mzml files? And how did you convert your files to mzml format? We usually use msconvert.

Are you able to run Quandenser with the example files provided on the github page: https://app.box.com/s/kp4219dc22l3gq27014nms8oco594c2i

tokeru commented 2 years ago

I did it according to your description in the Quandenser paper with MSConvert and peak picking on MS1 and MS2. Here are 2 of my mzml files: https://cloudstorage.tu-braunschweig.de/getlink/fi6gDSs5N5WnuMgck4218D9i/ btw my whole data set includes 192 of them. Maybe the big number alone is responsible for the crash? I can try the example files. Also just downloaded the linux version of quandenser to test it on a different system.

MatthewThe commented 2 years ago

I can indeed reproduce the error (on a Linux machine), I do get a more informative error message:

terminate called after throwing an instance of 'std::out_of_range'
  what():  Invalid cvParam accession "1003096"

Apparently, this line is causing the problem:

<cvParam cvRef="MS" accession="MS:1003096" name="LTQ Orbitrap Velos Pro" value=""/>

The accession MS:1003096 seems to be added to ProteoWizard after the last time Quandenser was build. I will try to re-build Quandenser with a newer version of ProteoWizard and see if that fixes the issue.

MatthewThe commented 2 years ago

Hi Alex,

I indeed managed to run your files through by building Quandenser with a newer version of ProteoWizard on Ubuntu. Unfortunately, there are still some build issues on Windows with the new ProteoWizard version, I'm trying to fix those now. I'll keep you posted.

MatthewThe commented 2 years ago

Hi Alex,

Apologies for the delay, there were some dependency issues I had to resolve to build Quandenser with the new ProteoWizard version. You can download the installers at the bottom of this page: https://github.com/statisticalbiotechnology/quandenser/actions/runs/1601072464

tokeru commented 2 years ago

Hi Matthew,

thanks for all your efforts. I just tried the newer Version. First with the old Dinosaur Data kept. This crashed again. So my initial thought was, maybe your dinosaur output changed, so I deleted the dinosaur output and started the Quandenser again. Still crash 😕 as I still get no Error code or something similar I can't really tell, whats the problem, but it looks like it crashes at the same timepoint: after reading dinosaur output and starting with maraclust. Also again tried to differ the amount of given resources, but with no success.

Best Alex


Von: MatthewThe @.> Gesendet: Montag, 20. Dezember 2021 09:39 An: statisticalbiotechnology/quandenser @.> Cc: tokeru @.>; Author @.> Betreff: Re: [statisticalbiotechnology/quandenser] quadenser crashing during maraclust (Issue #21)

Hi Alex,

Apologies for the delay, there were some dependency issues I had to resolve to build Quandenser with the new ProteoWizard version. You can download the installers at the bottom of this page: https://github.com/statisticalbiotechnology/quandenser/actions/runs/1601072464

— Reply to this email directly, view it on GitHubhttps://github.com/statisticalbiotechnology/quandenser/issues/21#issuecomment-997711164, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHU2OONJA5RFDGQEL327Y7DUR3TTBANCNFSM5JFAS42A. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you authored the thread.Message ID: @.***>

MatthewThe commented 2 years ago

Hi Alex, Alright, it might be a different problem then. I will try to reproduce the error on Windows tomorrow.

Best, Matthew

MatthewThe commented 2 years ago

Hi Alex,

I couldn't reproduce the error on Windows, everything went through as it should. Can you check that the Quandenser version you're using is Quandenser version 0.03.1, Build Date Dec 20 2021 16:03:51 (e.g. by running quandenser -h).

You can also try to only run it with the two files you shared with me, just to rule out that it isn't related to one of the other input files.

Best, Matthew

tokeru commented 2 years ago

Hi Matthew, first of all, sorry for answering that late. Christmas last year, and then there were some urgent things keeping me away from cool stuff to test for the future.. Indeed I still had the old version. The reason for this was that on the page you mentioned I didnt find the installers, because the link there isnt available when you're not signed in ... I therefore took again the installer from the releases page, which isn't updated yet. So my apologies! I just startet the quandenser again and it passed the critical moment. Hopefully it runs through now. Thanks and best regards Alex

tokeru commented 2 years ago

A short feedback in case you're interested: at first I had problems running Quandenser on Windows because it used more memory than I assigned. This led to a java crash when the PC ran out of memory (128GB RAM in the system). Apparently it also did not use the virtual memory on windows. In the hope of linux handling this a bit smarter I retried it on linux. Indeed quandenser is able to run with the virtual memory. Right now its using nearly 400GB of memory. While it seems to work, its probably not fast and not the way its supposed to behave. Speaking of speed, the algorithm is now runnning for more than two weeks. It took around 7 days at the step of match between runs. As far as I understood from the paper its spanning the tree two times, down and up again. While the first way took around 5min per Link for 191 links it took 16hrs. The other way around, though, it needed around 50min per link leading to 7 days of calculating MBR. Right now its mapping features to spectra according to the terminal and I can see its writing the Quandenser.feature_groups.tsv for somewhat around additional 7 days now. While also MaxQuant needed around 5 days for this data set, as its quite big, taking more than 2,5 weeks is huge. As you compared your calculating times with MaxQuant in the paper, I wonder if something is going wrong here. But one part of the problem maybe is the memory management while another is, that he is calculating many of the steps in only one core, though I enabled 40 of them (see picture below). quandenser arbeitsspeicher3

MatthewThe commented 2 years ago

Thank you very much for your detailed feedback!

Those numbers are indeed not very encouraging. Do I understand correctly that you're mostly running this on virtual memory? This could explain the very slow runtimes, as virtual memory will be orders of magnitude slower than physical memory.

Nevertheless, for such big searches we've been working on an alternative that should handle multiprocessing better. You can find it here: https://github.com/statisticalbiotechnology/quandenser-pipeline. We've tested it on a handful of medium-large datasets with good results but also ran into some performance problems on very big datasets (1000+ runs) that we're still trying to resolve.