Closed fredericlabbe closed 3 years ago
@fredericlabbe thank you for the log! We've seen a similar issue on very large trees occasionally- how many terminals do you have in your data? Does the large set of trees work for a smaller set of taxa?
Hi @michaelmiyagi and thanks for your prompt reply. QuIBL worked fine with a subset of 1,000 of my trees. However, I obtained the same error when I tried bigger subsets, i.e. 5,000 and 2,000 trees.
Good to know, thanks- how many tips do your trees have? That may be interacting with the length of the input.
My trees only have seven tips. For example: (Sp1:0.001091,(Sp2:0.001843,(Sp3:0.035451,(Sp4:0.037510,(Sp5:0.013369,(Sp6:0.009451,Sp7:0.013352):0.003551):0.002227):0.001135):0.027799):0.001091);
Hi @michaelmiyagi , I encountered the same issue working on a very small dataset (4 taxa, including an outgroup and ~ 500 gene trees). I did a little experiment by selecting fewer trees like 490 and in that case QuiBL ran just fine. But when I cranked it up to 493, I got that error (I thought the 493rd tree had a weird character or something but no).
Also I noticed when I changed the gradascentscalar parameter and/or decreased numsteps, QuiBL ran smoothly as well. I wonder if it has to do something with the EM algorithm when it hits very long branch lengths which in turn cause a numerical overflow. But that's just my guess.
Thanks.
Hi @antonysuv and @fredericlabbe,
Sorry about the delay! Times have been a bit crazy- thank you both very much for the info, I think your intuition is correct. I believe this shouldn't result in a bias in the runs which do complete, but I am looking into how to solve this consistently.
It looks like this issue was fixed in a more recent version of the program. Indeed, I was recently able to run QuIBL on a set of several thousands of trees without an error. Thank you @miriammiyagi for your hard work!
Hi @michaelmiyagi,
I have run QuIBL on a set of trees. Unfortunately, my QuIBL analysis stopped with an error when using all the trees (see below), while it is running perfectly with 100 of my trees... I was wondering if you already had this error before, and if you know how to fix it?
[Input] treefile: ./coding_chrX_bestTree.dnd numdistributions: 2 likelihoodthresh: 0.01 numsteps: 10 gradascentscalar: 0.5 totaloutgroup: AgamS1 multiproc: True maxcores: 12 [Output] OutputPath: ./coding_chrX_bestTree_Py2_out.csv
Loading python/2.7.15 Loading requirement: tcl/8.6.8 gcc/8.3.0 Analysis Started QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:121: RuntimeWarning: invalid value encountered in double_scalars xq=[l/sum(xq) for l in xq]#xq/sum(xq) QuIBL.py:111: RuntimeWarning: overflow encountered in exp y=np.sinh(x/lmbd)/(lmbd(-1+np.exp(C))) QuIBL.py:109: RuntimeWarning: overflow encountered in exp y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) QuIBL.py:109: RuntimeWarning: invalid value encountered in double_scalars y=0.5/lmbdnp.exp(-x/lmbd)(1+np.exp(C)) ['AmerM2', 'AmelC2', 'AaraD1'] is complete. ['AquaS1', 'AmelC2', 'AaraD1'] is complete. Traceback (most recent call last): File "QuIBL.py", line 340, in
inputReader(sys.argv[1])
File "QuIBL.py", line 336, in inputReader
outputFormatter(outputDict,inputDict)
File "QuIBL.py", line 307, in outputFormatter
tripletSet=Parallel(n_jobs=num_cores)(delayed(PLexMax)(triple,K,lthresh,numsteps,gAScalar) for triple in trees)
File "/afs/crc.nd.edu/user/f/flabbe/.local/lib/python2.7/site-packages/joblib/parallel.py", line 789, in call
self.retrieve()
File "/afs/crc.nd.edu/user/f/flabbe/.local/lib/python2.7/site-packages/joblib/parallel.py", line 740, in retrieve
raise exception
joblib.my_exceptions.JoblibValueError: JoblibValueError
Multiprocessing exception: ........................................................................... /scratch365/flabbe/2_Analysis/2_QuIBL/QuIBL.py in()
335 outputDict[flag]=config.get('Output',flag)
336 outputFormatter(outputDict,inputDict)
337
338 339 #print exMax(getTripBranches(readin_Newick(sys.argv[1])), 2, 0.01, 5, 0.5)[0].models --> 340 inputReader(sys.argv[1]) 341 342 343 344
........................................................................... /scratch365/flabbe/2_Analysis/2_QuIBL/QuIBL.py in inputReader(filepath='./sampleInputFile_X_Py2.txt') 331 print('Error: No input trees specified.') 332 return 333 outputDict={} 334 for flag in config.options('Output'): 335 outputDict[flag]=config.get('Output',flag) --> 336 outputFormatter(outputDict,inputDict) outputDict = {'outputpath': './coding_chrX_bestTree_Py2_out.csv'} inputDict = {'gradascentscalar': '0.5', 'likelihoodthresh': '0.01', 'maxcores': '12', 'multiproc': 'True', 'numdistributions': '2', 'numsteps': '10', 'totaloutgroup': 'AgamS1', 'treefile': './coding_chrX_bestTree.dnd'} 337
338 339 #print exMax(getTripBranches(readin_Newick(sys.argv[1])), 2, 0.01, 5, 0.5)[0].models 340 inputReader(sys.argv[1])
........................................................................... /scratch365/flabbe/2_Analysis/2_QuIBL/QuIBL.py in outputFormatter(outputDict={'outputpath': './coding_chrX_bestTree_Py2_out.csv'}, inputDict={'gradascentscalar': '0.5', 'likelihoodthresh': '0.01', 'maxcores': '12', 'multiproc': 'True', 'numdistributions': '2', 'numsteps': '10', 'totaloutgroup': 'AgamS1', 'treefile': './coding_chrX_bestTree.dnd'}) 302 multi=bool(inputDict['multiproc']) 303 corecap=int(inputDict['maxcores']) 304 num_cores=min(multiprocessing.cpu_count(),corecap) 305 print('Analysis Started') 306 if multi: --> 307 tripletSet=Parallel(n_jobs=num_cores)(delayed(PLexMax)(triple,K,lthresh,numsteps,gAScalar) for triple in trees) tripletSet = undefined num_cores = 12 triple = undefined trees = [<__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>, <__main__.tripletT instance>] 308 else: 309 tripletSet=exMax(getTripBranches(readin_Newick(inputDict['treefile']),canonOut), int(inputDict['numdistributions']), float(inputDict['likelihoodthresh']), int(inputDict['numsteps']), float(inputDict['gradascentscalar'])) 310 with open(outputDict['outputpath'],'w') as csv_out: 311 fieldnames=[]
........................................................................... /afs/crc.nd.edu/user/f/flabbe/.local/lib/python2.7/site-packages/joblib/parallel.py in call(self=Parallel(n_jobs=12), iterable=<generator object>)
784 if pre_dispatch == "all" or n_jobs == 1:
785 # The iterable was consumed all at once by the above for loop.
786 # No need to wait for async callbacks to trigger to
787 # consumption.
788 self._iterating = False
--> 789 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=12)>
790 # Make sure that we get a last message telling us we are done
791 elapsed_time = time.time() - self._start_time
792 self._print('Done %3i out of %3i | elapsed: %s finished',
793 (len(self._output), len(self._output),
Sub-process traceback:
ValueError Mon Mar 23 16:30:50 2020 PID: 23515 Python 2.7.15: /opt/crc/p/python/2.7.15/gcc/bin/python ........................................................................... /afs/crc.nd.edu/user/f/flabbe/.local/lib/python2.7/site-packages/joblib/parallel.py in call(self=)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self._size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
func =
args = (<__main__.tripletT instance>, 2, 0.01, 10, 0.5)
kwargs = {}
self.items = [(, (<__main__.tripletT instance>, 2, 0.01, 10, 0.5), {})]
132
133 def len(self):
134 return self._size
135
........................................................................... /scratch365/flabbe/2_Analysis/2_QuIBL/QuIBL.py in PLexMax(triple=<__main__.tripletT instance>, K=2, threshold=0.01, numSteps=10, tempScale=0.5) 278 lmbd,stepScale=gradAscent(branchData, XQList, cArray, pArray, lmbd, stepScale, numSteps, threshold) 279 cArray=[x(storedlmbd/lmbd) for x in cArray] 280 steps+=1 281 triple.setModel(outG, list(cArray), list(pArray), lmbd) 282 triple.setBIC(outG, np.log(len(branchData))(2K-1)-2(modelLogLik(branchData, list(cArray), list(pArray), lmbd))) --> 283 triple.setNull(outG,oneDistNull(branchData)) triple.setNull = <bound method tripletT.setNull of <__main__.tripletT instance>> outG = 'AcolM1' branchData = [0.003358, 0.012285, 0.022712, 0.020573, 0.019406, 0.013163, 0.036276, 0.020941, 0.008572, 0.008616, 0.00998, 0.021921, 0.009772, 0.008954, 0.045429, 0.009066, 0.005557, 0.015791, 1e-06, 0.028105, ...] 284 print(str(triple.taxaSet)+' is complete.') 285 return triple 286 287
........................................................................... /scratch365/flabbe/2_Analysis/2_QuIBL/QuIBL.py in oneDistNull(branchData=[0.003358, 0.012285, 0.022712, 0.020573, 0.019406, 0.013163, 0.036276, 0.020941, 0.008572, 0.008616, 0.00998, 0.021921, 0.009772, 0.008954, 0.045429, 0.009066, 0.005557, 0.015791, 1e-06, 0.028105, ...]) 285 return triple 286 287 288 def oneDistNull(branchData): 289 lmbd=np.mean(branchData) --> 290 bic=np.log(len(branchData))-2*(modelLogLik(branchData, [0.0], [1.0], lmbd)) bic = undefined branchData = [0.003358, 0.012285, 0.022712, 0.020573, 0.019406, 0.013163, 0.036276, 0.020941, 0.008572, 0.008616, 0.00998, 0.021921, 0.009772, 0.008954, 0.045429, 0.009066, 0.005557, 0.015791, 1e-06, 0.028105, ...] lmbd = 0.029741748299319738 291 return (lmbd,bic) 292
293 294 def outputFormatter(outputDict,inputDict):
........................................................................... /scratch365/flabbe/2_Analysis/2_QuIBL/QuIBL.py in modelLogLik(XSet=[0.003358, 0.012285, 0.022712, 0.020573, 0.019406, 0.013163, 0.036276, 0.020941, 0.008572, 0.008616, 0.00998, 0.021921, 0.009772, 0.008954, 0.045429, 0.009066, 0.005557, 0.015791, 1e-06, 0.028105, ...], cArray=[0.0], pArray=[1.0], lmbd=0.029741748299319738) 126 tempSum=0 127 for x in XSet: 128 L=0 129 for i in range(0,len(pArray)): 130 L+=pArray[i]*conPDF(x,cArray[i],lmbd) --> 131 tempSum+=log(L) tempSum = 1347.3621079575664 L = 0.0 132 return tempSum 133 134 def weightedLogLik(XQList, cArray,lmbd): 135 #Calculates the conditional likelihood on the Q values for each datapoint
ValueError: math domain error
Thanks in advance for your help, @fredericlabbe