statisticalbiotechnology / quandenser

QUANtification by Distillation for ENhanced Signals with Error Regulation
Apache License 2.0
9 stars 1 forks source link

Meaning of max missing #7

Closed andrewjmc closed 4 years ago

andrewjmc commented 4 years ago

Hello Matthew,

"Max missing" with a default of 1/4 number of files. Does this mean only features which are present in 75% or more of files will be output?

For my purposes, I am looking for likely rare peptides present in only one group of samples. I assume therefore that I will want max missing to be very low (if not zero). Will this cause any problems for the algorithm?

Thanks,

Andrew

MatthewThe commented 4 years ago

Yes, by default we only retain features present in at least 75% of the runs, e.g. if you have 12 runs max-missing will be 3.

If you expect the feature to be present in only one of the groups, you should actually set max-missing to a high value, e.g. total_num_runs - 2. The algorithm itself has no problems with this, though you should be careful as these groups with many missing values may have a higher proportion of false positives.

andrewjmc commented 4 years ago

Yes, a high number, of course!

Best wishes,

Andrew