Closed CorinaMeyer closed 1 year ago
Hello Corina,
For CliqueMS: I did submit a few fixes some time ago, which could be related to this. Unfortunately, they haven't been integrated by the author so far. The handbook suggests to install the version with my patches, maybe you could (re-)install this version just to be safe?
remotes::install_github("rickhelmus/cliqueMS")
For the rest it sounds like the componentization algorithms have a lot of data to chew un. Maybe they simply didn't finish yet, even if things appear to be stuck? I am curious with how many feature groups you are dealing with? Perhaps you could try more strict filters to reduce the number. You could at least do this to confirm if componentization is working at all, for instance, by increasing the intensity threshold during the feature group filter step.
Thanks, Rick
Dear Rick,
Thanks for your quick answer. For CliqueMS I still get the same error. I started RAMClustR again, but set a very high intensity filter beforehand. Now I have 54’685 features in 2’970 groups. Is this a lot? From my experience with non-targeted data analysis of untreated wastewater, this is not too much. But I guess it is a lot in comparison to groundwater or something else with significantly less matrix. For me it seems that it is the cacheing which keeps R busy. The cache.sqlite file is constantly increasing in size. Do you have any experience how big this cache file gets with my roughly 3’000 groups? Just to get an estimate of duration.
Thanks again, Corina
From: rickhelmus @.> Sent: Montag, 6. März 2023 14:37 To: rickhelmus/patRoon @.> Cc: Meyer, Corina @.>; Author @.> Subject: Re: [rickhelmus/patRoon] Componentization (Issue #71)
This e-mail was sent to you by someone outside Eawag. You should only click on links or attachments if you are certain that the e-mail is genuine and the content is safe.
Hello Corina,
For CliqueMS: I did submit a few fixes some time ago, which could be related to this. Unfortunately, they haven't been integrated by the author so far. The handbook suggests to install the version with my patches, maybe you could (re-)install this version just to be safe?
remotes::install_github("rickhelmus/cliqueMS")
For the rest it sounds like the componentization algorithms have a lot of data to chew un. Maybe they simply didn't finish yet, even if things appear to be stuck? I am curious with how many feature groups you are dealing with? Perhaps you could try more strict filters to reduce the number. You could at least do this to confirm if componentization is working at all, for instance, by increasing the intensity threshold during the feature group filter step.
Thanks, Rick
— Reply to this email directly, view it on GitHubhttps://github.com/rickhelmus/patRoon/issues/71#issuecomment-1456150354, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6CAULI3W7CLMDHJCER5VY3W2XR65ANCNFSM6AAAAAAVQ6YF5Y. You are receiving this because you authored the thread.Message ID: @.***>
Hi Corina,
About 3000 features groups after the filter()
step is a decent amount, but nothing too extreme I would say. How many analyses are you processing? I just tried to run the componentization with the unfiltered demo data (~1500 feature groups, ~8000 features) and it took my system about 2 minutes. So I guess it's not strange that with your feature numbers you would need wait a bit :-)
Since you mentioned you applied a very high intensity threshold now, does that mean you previously had also way more feature groups? I would say having much more than a couple of thousand feature groups after filtering your feature group data may yield to very long processing times in some steps.
For the cache: yes it's normal that this can quickly grow, but of course also within reasonable amounts. What kind of file sizes are we talking about with your workflow? With the demo data test run my cache file ~300 MB, although I have seen it grow to a few GB with more realistic projects.
Thanks, Rick
Hi Rick,
I was processing 51 files. I had around 50'000 features. The componentization with RAMClustR took several hours and the cache file has grown to more than 200 GB and I stopped it before it was finished because I do not have so much space. Now I am running with 18 files and the high intensity filter. The cache is now about 5 GB. Currently I am running the generateFormulas function but with more than the default “CHNOP”, namely “CHNOPSFClBrI”. However, this is now running since more than 24 h and it has only progressed to 66%. What also confused me was that it went within few minutes to 41% but took now many hours to come from 41% to 66%. So the progress bar is not proportional to the time, which makes it also very hard for me to estimate how long it will take.
I am wondering what I am doing wrong that it takes so much time and storage space with my dataset. Any idea would be appreciated.
Best, Corina
From: rickhelmus @.> Sent: Dienstag, 7. März 2023 11:06 To: rickhelmus/patRoon @.> Cc: Meyer, Corina @.>; Author @.> Subject: Re: [rickhelmus/patRoon] Componentization (Issue #71)
This e-mail was sent to you by someone outside Eawag. You should only click on links or attachments if you are certain that the e-mail is genuine and the content is safe.
Hi Corina,
About 3000 features groups after the filter() step is a decent amount, but nothing too extreme I would say. How many analyses are you processing? I just tried to run the componentization with the unfiltered demo data (~1500 feature groups, ~8000 features) and it took my system about 2 minutes. So I guess it's not strange that with your feature numbers you would need wait a bit :-)
Since you mentioned you applied a very high intensity threshold now, does that mean you previously had also way more feature groups? I would say having much more than a couple of thousand feature groups after filtering your feature group data may yield to very long processing times in some steps.
For the cache: yes it's normal that this can quickly grow, but of course also within reasonable amounts. What kind of file sizes are we talking about with your workflow? With the demo data test run my cache file ~300 MB, although I have seen it grow to a few GB with more realistic projects.
Thanks, Rick
— Reply to this email directly, view it on GitHubhttps://github.com/rickhelmus/patRoon/issues/71#issuecomment-1457891701, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6CAULJVY2XL3KHZJ5R4T5LW24CBHANCNFSM6AAAAAAVQ6YF5Y. You are receiving this because you authored the thread.Message ID: @.***>
Hi Corina,
Thank you for the additional details.
You are right that so many hours of processing and a cache file of hundreds of GB is very unreasonable! The 50.000 features you mentioned, where they the grouped features (feature groups) or raw features (eg what findFeatures() returns)? And if they were feature groups, was this after the filter() step? If the question is 'yes' to both, I understand what is happening ;-) These amounts would be simply way too much to work with (I guess it would've been nice if patRoon warned you about this). Are you working by any chance with Orbitrap data? I noticed that with Orbitraps the intensity scale is quite different and some more careful optimization is needed. You can have a look at the suspect screening workflow in the patRoon paper to get some inspiration.
The situation with GenForm is a bit tricky. You are right that the progress bar is not linear. The reason for this is that the calculations generally go from low to high m/z. Unfortunately, GenForm tends to produce a lot of candidates for higher feature masses, especially with additional elements specified. There are few things that you can try to remedy this:
timeout
option to generateFormulas()
(in seconds).maxCandidates
: this is similar to the timeout
option: in this case calculations will be stopped when GenForm reaches a maximum number of candidates. The difference with the timeout
option is that it will still proceed with the candidates that were generated before the threshold was reached.calculateFeatures
to FALSE
. This way calculations are done on a feature group level, instead of per feature. This might be a bit less accurate, but in my experience it still works very well and I'm considering to make this default as this often makes more sense.filter(fGroups, mzRange = c(0, 600))
Hope this helps!
Closed due to inactivity, feel free to re-open!
Dear Rick,
I just started to use patRoon. Up to the componentization it worked very well. However, for the componentization I get, depending on the algorithm, either errors or it says in the console that it is finished, but the processing does not stop.
For CliqueMS I get the following error:
Annotating all features with CliqueMS for 51 analyses ... Exporting to XCMS features... Done! | | 0%Error in Matrix::tril(adjmatrix, -1) : 'k2' must be an integer from -Dim[1] to Dim[2]
For RAMClustR it counts through all components and ends with ... 1620 of 1637 1630 of 1637 finished but although it says finished, R seems still busy.
Similar it is for CAMERA, where I end up with Calculating possible adducts in 8020 Groups... % finished: 10 20 30 40 50 60 70 80 90 100 where nothing more happens, not even after several minutes of waiting.
I attached you my workflow. workflow_patRoon.txt
Best, Corina