matsengrp / phip-flow

A Nextflow pipeline to align, merge, and organize large PhIP-Seq datasets
MIT License
9 stars 6 forks source link

added neg binom fit #47

Closed jgallowa07 closed 1 year ago

jgallowa07 commented 2 years ago

Added optional workflow for fitting a negative binomial as in https://matsengrp.github.io/phippery/bkg-model.html

jgallowa07 commented 2 years ago

Isolating the empirical samples seems unnecessary to me, because they aren't used anywhere. I suggest to remove these lines

good catch, should we be predicting on everything (even the sample that were fit on?)

modeling.neg_binom_model() returns 3 objects when inplace is False: two lists containing the fit parameters per peptide ID and a copy of the xarray dataset. In the example, this is all returned into one variable fit_ds. Is this what you really want?

All I need to do is returned the xarray object, I guess. We could add more options to the pipeline that would allow us to return the parameter values, but let's put that off for the time being.

I don't see where you issue a warning if the number of beads samples are low. Did you not put that in after all? added.

yup, added. Not sure there's a great way to do this other than a warning if there is less than say, 50 beads_only samples for the fit, and a pointer to the documentation.

ksung25 commented 2 years ago

Isolating the empirical samples seems unnecessary to me, because they aren't used anywhere. I suggest to remove these lines

good catch, should we be predicting on everything (even the sample that were fit on?)

Computing p-values for input library samples is unnecessary. Having p-values for peptides counts in beads samples would allow us to perform some FDR-based method for establishing a p-value threshold (although I've seen that the beads samples sometimes have more extreme outliers than seen in empirical samples, thereby making the validity of using the beads samples in this way somewhat questionable...)

I don't see where you issue a warning if the number of beads samples are low. Did you not put that in after all? added.

yup, added. Not sure there's a great way to do this other than a warning if there is less than say, 50 beads_only samples for the fit, and a pointer to the documentation.

Spotted some typos in the warning message. Here's the correction: "With less than 50 beads_only samples, it's probable that many peptide distribution fits will not converge, especially if coverage is low. See https://matsengrp.github.io/phippery/bkg-model.html for more." And to be consistent with the warning message you want strictly less than 50 in this line here, right?