statisticalbiotechnology / quandenser

QUANtification by Distillation for ENhanced Signals with Error Regulation
Apache License 2.0
9 stars 1 forks source link

NANs reported when aligning runs #2

Open asalt opened 5 years ago

asalt commented 5 years ago

Running the example files through quandenser works fine, but I've ran into a problem when trying to run on a new dataset.

Specifically, when aligning runs, some messages to stdout return as: Aligned runs: 0 6: rmseComb = nan rmse1 = nan rmse2 = nan And then afterwards generating lines like Inserting link 0 to 6 with rmse nan The search_and_link_x_y_dinosaur_targets.tsv is also filled with -nan for seemingly all rtStart and rtEnd entries.

This then causes dinosaur to hang/crash for targeted search with the error : java.lang.NumberFormatException: For input string: "-nan"

Beyond this, here is my experimental setup:

I have two groups of 3 files, each of which have 5 replicates. My file list looks like this (except 5 fractions for each sample and a total of 6 samples):

file name
ctrl1_f1.mzML c1_f1
ctrl1_f2.mzML c1_f2
ctrl2_f2.mzML c2_f1
ctrl2_f2.mzML c2_f2
treatl1_f1.mzML t1_f1
treat1_f2.mzML t1_f2
treat2_f1.mzML t2_f1
treat2_f2.mzML t2_f2

So more generally, I feel that I want to really be clustering only samples within the same sample group (ctrl or treatment) and only within each fraction (so don't cluster ctrl1_f1 with ctrl1_f2). I believe this what MaxQuant does with MBR. But here everything is clustering to everything; a total of 30 file pairs are being clustered and aligned. So this suggests that quandenser should be performed maximally on each fraction set separately (e.g. all fraction 1s, then all fraction 2s). Maybe I missed a suggestion of this nature in the preprint or readme? Any guidance or suggestions would be greatly appreciated.

Thanks!

MatthewThe commented 5 years ago

Currently, quandenser does not provide direct support for fractionated samples. The nans are simply a result of quandenser failing to find enough overlap in ms2 spectra between certain runs. I'll create a more clear error message for this.

However, as you mention, I think the way to do this would be to process one fraction at a time and combine the results just before doing the protein quantification. I haven't fully thought through the consequences of this approach, but it might be worth a try.

asalt commented 5 years ago

Thanks for the response and clarification. I'll give it a try running on a per-fraction basis and combine results.