miriammiyagi / QuIBL

Quantifying Introgression via Branch Lengths
49 stars 7 forks source link

ValueError: math domain error #9

Closed fredericlabbe closed 3 years ago

fredericlabbe commented 4 years ago

Hi @michaelmiyagi,

I have run QuIBL on a set of trees. Unfortunately, my QuIBL analysis stopped with an error when using all the trees (see below), while it is running perfectly with 100 of my trees... I was wondering if you already had this error before, and if you know how to fix it?

[Input] treefile: ./coding_chrX_bestTree.dnd numdistributions: 2 likelihoodthresh: 0.01 numsteps: 10 gradascentscalar: 0.5 totaloutgroup: AgamS1 multiproc: True maxcores: 12 [Output] OutputPath: ./coding_chrX_bestTree_Py2_out.csv

Loading python/2.7.15 Loading requirement: tcl/8.6.8 gcc/8.3.0 Analysis Started QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) ['AmerM2', 'AmelC2', 'AaraD1'] is complete. ['AquaS1', 'AmelC2', 'AaraD1'] is complete. Traceback (most recent call last): File "QuIBL.py", line 340, in inputReader(sys.argv[1]) File "QuIBL.py", line 336, in inputReader outputFormatter(outputDict,inputDict) File "QuIBL.py", line 307, in outputFormatter tripletSet=Parallel(n_jobs=num_cores)(delayed(PLexMax)(triple,K,lthresh,numsteps,gAScalar) for triple in trees) File "/afs/crc.nd.edu/user/f/flabbe/.local/lib/python2.7/site-packages/joblib/parallel.py", line 789, in call self.retrieve() File "/afs/crc.nd.edu/user/f/flabbe/.local/lib/python2.7/site-packages/joblib/parallel.py", line 740, in retrieve raise exception joblib.my_exceptions.JoblibValueError: JoblibValueError


Multiprocessing exception: ........................................................................... /scratch365/flabbe/2_Analysis/2_QuIBL/QuIBL.py in () 335 outputDict[flag]=config.get('Output',flag) 336 outputFormatter(outputDict,inputDict) 337
338 339 #print exMax(getTripBranches(readin_Newick(sys.argv[1])), 2, 0.01, 5, 0.5)[0].models --> 340 inputReader(sys.argv[1]) 341 342 343 344

........................................................................... /scratch365/flabbe/2_Analysis/2_QuIBL/QuIBL.py in inputReader(filepath='./sampleInputFile_X_Py2.txt') 331 print('Error: No input trees specified.') 332 return 333 outputDict={} 334 for flag in config.options('Output'): 335 outputDict[flag]=config.get('Output',flag) --> 336 outputFormatter(outputDict,inputDict) outputDict = {'outputpath': './coding_chrX_bestTree_Py2_out.csv'} inputDict = {'gradascentscalar': '0.5', 'likelihoodthresh': '0.01', 'maxcores': '12', 'multiproc': 'True', 'numdistributions': '2', 'numsteps': '10', 'totaloutgroup': 'AgamS1', 'treefile': './coding_chrX_bestTree.dnd'} 337
338 339 #print exMax(getTripBranches(readin_Newick(sys.argv[1])), 2, 0.01, 5, 0.5)[0].models 340 inputReader(sys.argv[1])

........................................................................... /scratch365/flabbe/2_Analysis/2_QuIBL/QuIBL.py in outputFormatter(outputDict={'outputpath': './coding_chrX_bestTree_Py2_out.csv'}, inputDict={'gradascentscalar': '0.5', 'likelihoodthresh': '0.01', 'maxcores': '12', 'multiproc': 'True', 'numdistributions': '2', 'numsteps': '10', 'totaloutgroup': 'AgamS1', 'treefile': './coding_chrX_bestTree.dnd'}) 302 multi=bool(inputDict['multiproc']) 303 corecap=int(inputDict['maxcores']) 304 num_cores=min(multiprocessing.cpu_count(),corecap) 305 print('Analysis Started') 306 if multi: --> 307 tripletSet=Parallel(n_jobs=num_cores)(delayed(PLexMax)(triple,K,lthresh,numsteps,gAScalar) for triple in trees) tripletSet = undefined num_cores = 12 triple = undefined trees = [<__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>] 308 else: 309 tripletSet=exMax(getTripBranches(readin_Newick(inputDict['treefile']),canonOut), int(inputDict['numdistributions']), float(inputDict['likelihoodthresh']), int(inputDict['numsteps']), float(inputDict['gradascentscalar'])) 310 with open(outputDict['outputpath'],'w') as csv_out: 311 fieldnames=[]

........................................................................... /afs/crc.nd.edu/user/f/flabbe/.local/lib/python2.7/site-packages/joblib/parallel.py in call(self=Parallel(n_jobs=12), iterable=<generator object >) 784 if pre_dispatch == "all" or n_jobs == 1: 785 # The iterable was consumed all at once by the above for loop. 786 # No need to wait for async callbacks to trigger to 787 # consumption. 788 self._iterating = False --> 789 self.retrieve() self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=12)> 790 # Make sure that we get a last message telling us we are done 791 elapsed_time = time.time() - self._start_time 792 self._print('Done %3i out of %3i | elapsed: %s finished', 793 (len(self._output), len(self._output),


Sub-process traceback:

ValueError Mon Mar 23 16:30:50 2020 PID: 23515 Python 2.7.15: /opt/crc/p/python/2.7.15/gcc/bin/python ........................................................................... /afs/crc.nd.edu/user/f/flabbe/.local/lib/python2.7/site-packages/joblib/parallel.py in call(self=) 126 def init(self, iterator_slice): 127 self.items = list(iterator_slice) 128 self._size = len(self.items) 129 130 def call(self): --> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items] func = args = (<__main__.tripletT instance>, 2, 0.01, 10, 0.5) kwargs = {} self.items = [(, (<__main__.tripletT instance>, 2, 0.01, 10, 0.5), {})] 132 133 def len(self): 134 return self._size 135

........................................................................... /scratch365/flabbe/2_Analysis/2_QuIBL/QuIBL.py in PLexMax(triple=<__main__.tripletT instance>, K=2, threshold=0.01, numSteps=10, tempScale=0.5) 278 lmbd,stepScale=gradAscent(branchData, XQList, cArray, pArray, lmbd, stepScale, numSteps, threshold) 279 cArray=[x(storedlmbd/lmbd) for x in cArray] 280 steps+=1 281 triple.setModel(outG, list(cArray), list(pArray), lmbd) 282 triple.setBIC(outG, np.log(len(branchData))(2K-1)-2(modelLogLik(branchData, list(cArray), list(pArray), lmbd))) --> 283 triple.setNull(outG,oneDistNull(branchData)) triple.setNull = <bound method tripletT.setNull of <__main__.tripletT instance>> outG = 'AcolM1' branchData = [0.003358, 0.012285, 0.022712, 0.020573, 0.019406, 0.013163, 0.036276, 0.020941, 0.008572, 0.008616, 0.00998, 0.021921, 0.009772, 0.008954, 0.045429, 0.009066, 0.005557, 0.015791, 1e-06, 0.028105, ...] 284 print(str(triple.taxaSet)+' is complete.') 285 return triple 286 287

........................................................................... /scratch365/flabbe/2_Analysis/2_QuIBL/QuIBL.py in oneDistNull(branchData=[0.003358, 0.012285, 0.022712, 0.020573, 0.019406, 0.013163, 0.036276, 0.020941, 0.008572, 0.008616, 0.00998, 0.021921, 0.009772, 0.008954, 0.045429, 0.009066, 0.005557, 0.015791, 1e-06, 0.028105, ...]) 285 return triple 286 287 288 def oneDistNull(branchData): 289 lmbd=np.mean(branchData) --> 290 bic=np.log(len(branchData))-2*(modelLogLik(branchData, [0.0], [1.0], lmbd)) bic = undefined branchData = [0.003358, 0.012285, 0.022712, 0.020573, 0.019406, 0.013163, 0.036276, 0.020941, 0.008572, 0.008616, 0.00998, 0.021921, 0.009772, 0.008954, 0.045429, 0.009066, 0.005557, 0.015791, 1e-06, 0.028105, ...] lmbd = 0.029741748299319738 291 return (lmbd,bic) 292
293 294 def outputFormatter(outputDict,inputDict):

........................................................................... /scratch365/flabbe/2_Analysis/2_QuIBL/QuIBL.py in modelLogLik(XSet=[0.003358, 0.012285, 0.022712, 0.020573, 0.019406, 0.013163, 0.036276, 0.020941, 0.008572, 0.008616, 0.00998, 0.021921, 0.009772, 0.008954, 0.045429, 0.009066, 0.005557, 0.015791, 1e-06, 0.028105, ...], cArray=[0.0], pArray=[1.0], lmbd=0.029741748299319738) 126 tempSum=0 127 for x in XSet: 128 L=0 129 for i in range(0,len(pArray)): 130 L+=pArray[i]*conPDF(x,cArray[i],lmbd) --> 131 tempSum+=log(L) tempSum = 1347.3621079575664 L = 0.0 132 return tempSum 133 134 def weightedLogLik(XQList, cArray,lmbd): 135 #Calculates the conditional likelihood on the Q values for each datapoint

ValueError: math domain error


Thanks in advance for your help, @fredericlabbe

miriammiyagi commented 4 years ago

@fredericlabbe thank you for the log! We've seen a similar issue on very large trees occasionally- how many terminals do you have in your data? Does the large set of trees work for a smaller set of taxa?

fredericlabbe commented 4 years ago

Hi @michaelmiyagi and thanks for your prompt reply. QuIBL worked fine with a subset of 1,000 of my trees. However, I obtained the same error when I tried bigger subsets, i.e. 5,000 and 2,000 trees.

miriammiyagi commented 4 years ago

Good to know, thanks- how many tips do your trees have? That may be interacting with the length of the input.

fredericlabbe commented 4 years ago

My trees only have seven tips. For example: (Sp1:0.001091,(Sp2:0.001843,(Sp3:0.035451,(Sp4:0.037510,(Sp5:0.013369,(Sp6:0.009451,Sp7:0.013352):0.003551):0.002227):0.001135):0.027799):0.001091);

antonysuv commented 4 years ago

Hi @michaelmiyagi , I encountered the same issue working on a very small dataset (4 taxa, including an outgroup and ~ 500 gene trees). I did a little experiment by selecting fewer trees like 490 and in that case QuiBL ran just fine. But when I cranked it up to 493, I got that error (I thought the 493rd tree had a weird character or something but no).

Also I noticed when I changed the gradascentscalar parameter and/or decreased numsteps, QuiBL ran smoothly as well. I wonder if it has to do something with the EM algorithm when it hits very long branch lengths which in turn cause a numerical overflow. But that's just my guess.

Thanks.

miriammiyagi commented 4 years ago

Hi @antonysuv and @fredericlabbe,

Sorry about the delay! Times have been a bit crazy- thank you both very much for the info, I think your intuition is correct. I believe this shouldn't result in a bias in the runs which do complete, but I am looking into how to solve this consistently.

fredericlabbe commented 3 years ago

It looks like this issue was fixed in a more recent version of the program. Indeed, I was recently able to run QuIBL on a set of several thousands of trees without an error. Thank you @miriammiyagi for your hard work!