mshinn23 / nardini

GNU General Public License v3.0
3 stars 0 forks source link

FitError: Optimization converged to parameters that are outside the range allowed by the distribution. #2

Open martinobertoni opened 8 months ago

martinobertoni commented 8 months ago

Hi There, one Nardini requirements is scipy<1.7 I nonetheless tryied it in my enviroment (with scipy 1.11.1) the fundamental error is in the function generating the scrambled sequences (get_scramble_seqs_vals) once the many sequences are generated it fails because of this error raised by the gamma function fit stats.gamma.fit: FitError: Optimization converged to parameters that are outside the range allowed by the distribution.

after version 1.7 scipy validates the learned parameters, and crashes if they do not fit. If the version limitation is because of this problem can we be confident that the learned parameters are valid? Can you provide an alternative solution to limiting the package version? Thanks

jaredl7 commented 8 months ago

Hi @Martino, thanks for your message - my apologies for the delay in getting back to you! @mshinn23 and I chatted about this and thought to share the following:

One possible solution for newer scipy versions (1.7+) is to decrease the number of scrambles. In our experience, 5 x 10^4 or even 1 x 10^4 seemed to work with a difference of less than 0.2 in z-score value.

The distribution of the scrambles will depend on: 1) the fraction of each residue group or residue in the sequence and 2) the length of the sequence. For example, if your sequence is composed of high fractions of a few particular residues (i.e., low complexity), you can imagine that the background distribution of the scramble sequence will not be robust. Furthermore, if your sequence length is shorter, this will further limit the number of possible combinations of scrambled sequences.

We use scipy<1.7 for compatibility since updates to the internals of scipy since 1.7 require retooling of the underlying z-score algorithm. This is being worked on, and will be made available in a subsequent release.

Another work-around is to create a custom conda environment or virtual env to use an older Python version (e.g. 3.8) for any nardini related analysis. For e.g., with conda that could be: conda create -n nardini-env python=3.8. Then, one can install nardini through usual means via pip or conda