paula-tataru / polyDFE

predicting DFE and alpha from polymorphism data
GNU General Public License v3.0
28 stars 0 forks source link

Local optimization: reached maximum number of iterations allowed, but not really maximum number of allowed iterations #2

Closed peterdfields closed 6 years ago

peterdfields commented 6 years ago

Hi Paula,

I'm trying to run polyDFE with model C and near default parameters. I am providing only a single uSFS for neutral and selected sites which summarizes > 10k genes. I am not at this point including divergence , I'm just trying to get the program running and then I can optimize parameters to improve the estimator. I have tried a number of different options with the -p flag but I'm consistently ending up with the program stalling at ~42 iterations, followed after a while by the printing of three iterations with the status -8 code, a -inf value for the lnlikelihood, NAN for the gradient, and the suggestion that the maximum number of iterations has been reached. This doesn't seem correct given I'm allowing for a much larger number iterations in the params.optim file. Also the running time is well under the allow 3h value (it takes ~ 20min). Is there anything I can do to determine what might be wrong or allow the program to run a larger number of iterations before stalling? Thank you for your time and advice.

paula-tataru commented 6 years ago

Hi Peter,

When the optimization ends with status -8, it's because the likelihood has been computed, for just one iteration of the optimization procedure, more than 5000 times. The message on the screen should be "-- Reached maximum number of likelihood evaluations allowed. Results are not reliable". If, instead, you get "-- Local optimization: reached maximum number of iterations allowed", then that's a bug that I have to fix (so please let me know which of the two you get).

The reason I added this condition is because I noticed that sometimes the optimization would stall for a very long time and then I had to manually stop it. It seems to me that it's the same behavior you noticed, it takes a really long time before it does anything, and that's because GSL keeps calculating the likelihood using different parameters.

I am currently preparing a new release and if you think it would be useful, I can add one more entry in the file from -p to also customize this number. But I don't think that allowing GSL to make more calculations will end up anywhere.

My suggestions to get around this are to try different initial values for the parameters:

Please let me know how it goes.

peterdfields commented 6 years ago

Hi Paula,

Thank you for your response. I now see what parameters I'm using are generating the two different outcomes. I had added a line to the params.optim file that would allow a relatively smaller number of iterations to see how it would change the output (and to make sure the results I was seeing weren't a formatting issue). When the value provided is around 42 I get "-- Local optimization: reached maximum number of iterations allowed" but if I allow for > 42 or say a number like 1000, I get "-- Reached maximum number of likelihood evaluations allowed. Results are not reliable", and if I allow, say, 10 iterations I get normal completion of the code (I think) though the gradient is very large. I don't know if the former message would imply a bug though.

I have now tried using both the -e flag and default -b parameters, the two sets of -b parameters described in the params.basinhop file, and a set of my own that doubled most of the params.basinhop parameters. An example output is the following:

-- Reached maximum number of likelihood evaluations allowed. Results are not reliable

---- Basin hopping performed 498 iterations
---- Basin hopping is stuck in the same solution

---- Running time: 26 minutes and 13 seconds

Are there parameter ranges you would suggest to try?

On a side note, I would mention that in analyzing these data using dfe-alpha we have found some relatively complicated results that seem to suggest a great deal of constraint on positive selection (close to zero or slightly negative omega_a). Is it possible that this sort of scenario would cause problems in the optimization algorithm?

paula-tataru commented 6 years ago

Hi Peter,

Can you send the output from the run you did using -b? If only some runs end with status -8 but the one where the best parameters have been found ends ok, then I think the result it's ok.

To test for evidence in the data of positive selection, you can fix pb and Sb to 0 in the initialization file, and use a LET to compare the two models. In the simulations we did, we inferred a full DFE (i.e. where we estimated both pb and Sb) on data that only contain ed negative selection, and we didn't have any issues we that. We also didn't find any evidence for positive selection in the SFS in chimp data and the optimization went fine.

If I understand you correctly, I don't think there's a bug with the status message.

peterdfields commented 6 years ago

Hi Paula,

You can see the output here: https://gist.github.com/peterdfields/ac563bc9df12e5f545a9c85c64408a1b The command executed was as follows:

polyDFE -d input/Control_uSFS.txt -v 1 -b -t -e -w

The type of analysis I'm attempting is to estimate the full DFE and alpha on both the a genome wide control set of genes and then these same values for different classes of genes. We can then compare if the DFE and alpha is significantly different for particular classes of genes from the genome wide control group. We would also like to be able to compare the full DFE and alpha to that estimated from other species. Given your results described above perhaps it's not specific aspects of our alpha and DFE, but rather the data itself that might be problematic? I could also send you the input file if that could be useful for identifying what might be going wrong?

paula-tataru commented 6 years ago

Hi Peter,

I would really like to have the dataset, I think there might be a bug, I find it weird that all the subsequent bssin hopping runs have status -8 from the start.

peterdfields commented 6 years ago

Okay. Is there an email address I should use to send it to you?

paula-tataru commented 6 years ago

paula@birc.au.dk

paula-tataru commented 6 years ago

I have committed an update to polyDFE which addresses the issue: when the maximum allowed evaluations of likelihood is reached, the optimization moves on further, and the basin hopping also works as expected in that case.