Closed nirshahaf closed 2 months ago
Hi,
I think compounds above 1000 Da (and with so many high mass fragments) are too much for the ILP. I would suggest that we compute high mass compounds with the heuristic. Yes, computing the exact solution is nice, but it is ridiculous that 99% of the running time of one analysis is spent on a few high mass compounds :/
Hi Kai,
I'm not sure about the mentioned heuristic, but I found a surprising outcome when testing in the NI mode (full spectra below). The NI mode has less fragments in this case and I expected Sirius to conclude much faster - however it hasn't and after roughly an hour of calculating I stopped it and looked at the generated spectra and trees - where I did not find the correct formula (N=10). I then truncated the original .ms file into to versions: one with just the two highest mass fragments (+isotopes) and another with the remaining two lower mass fragments (one without detected isotopes). I then run Sirius using the same parameters on each truncated input and was glad to notice that in both cases it completed normally within a couple of minutes AND with the correct formula identified either in rank 1 (the lower mass fragments) or rank 2 (the higher mass fragments). I therefore think that there is something funky going on and that you might want to add some heuristic which for the >1000 or >900 Da compounds would truncate the MS2 peaks in a more rational way than I did and leave just sufficient data for the algorithm to converge efficiently.
Here is the full input .ms file (same compound as in the first thread):
compound NP-008069 formula C61H94O34 parentmass 1415.56745241152 ionization [M+FA-H]-
ms1 1369.5620664683 2609.126953125 1370.10416027068 18.6966857910156 1370.56446981945 1801.8505859375 1371.56618519547 844.20751953125 1372.08129882812 21.1581573486328 1372.57047210081 294.842041015625 1373.27502441406 19.445068359375 1373.5701012032 100.971496582031 1373.60245745325 42.7364807128906 1374.27172851562 22.0960540771484 1374.55733493081 31.4545288085938 1374.89514160156 22.106689453125 1375.24062502631 16.9260559082031 1375.55367487289 14.6835479736328 1405.54140239552 70.5264282226562 1406.53388919922 42.8983764648438 1407.52470046546 41.3442077636719 1408.56120447965 28.3544311523438 1415.56745241152 2769.6484375 1416.56816971123 1913.974609375 1417.57561788284 903.10400390625 1418.23605588669 21.3117828369141 1418.57918878837 339.12109375 1418.92863923066 24.0499725341797 1419.56505776432 50.0323791503906 1419.58984662733 113.657287597656 1419.95555610225 24.2431182861328
ms2 723.178848771134 44.1402587890625 911.499147031881 588.01171875 912.502626093085 265.8583984375 913.513163759151 99.551025390625 914.502358631551 29.5118255615234 1140.4976410677 11.6455688476562 1311.52404785156 84.174560546875 1313.58103594598 16.0462646484375 1328.54187990961 113.484313964844 1329.55023359777 59.5953063964844 1330.58057369567 21.9123382568359 1371.18811035156 81.0126342773438 1372.12414550781 40.5063171386719 1372.56727934847 850.6328125 1372.87884447927 81.0126342773438 1373.29724121094 27.4158782958984 1373.57458496094 171.71630859375 1373.89599609375 31.0443572998047 1374.11962890625 27.3267669677734 1374.58142089844 49.1774597167969 1375.13537597656 18.7820587158203 1375.31176757812 17.7598571777344 1376.44311523438 18.0995178222656 1377.12356556204 14.66064453125 1391.51037597656 23.0181121826172
BTW,
If you need to retrain some model on higher mass compounds, I can generate 275 spectra of chemical standards with a mass range between 1001 and 1964 Da. I've taken Sebastian's comment from a few years ago and have increased the sensitivity of the peak extraction method - i.e. there are more low-mass putative fragments now.
Sirius team,
Apart from the blocking issues which seem (independently) be due to connection problems with the CSI server, I have a very slow convergence of the algorithm when running on .ms file of a larger compound of m/z=1388 corresponding with [M+NH4]+. The running parameters where optimized for this situation - by reducing the PPM threshold and reserving sufficient resources - nevertheless the Sirius command line took roughly a day(!) to conclude, see:
The input spectra below, have you any idea? Maybe related to the Java run time parameters? Or to the large number of putative fragments (originating from DIA method)??