microbiome / NMGS

Neutral model
3 stars 7 forks source link

NAs and computation for very very large datasets #3

Open odo1977 opened 8 years ago

odo1977 commented 8 years ago

Hello, I´ve downloaded the scripts and run the readme.md 'tutorial' fine in a large computer cluster.

I´ve been able to produce results from a mock (greatly reduced) dataset, however I get nan nan 0 1964 0.000000 nan nan 0 1963 0.000000 when I run the ./Scripts/Sig.pl 1 3 and ./Scripts/Sig.pl 2 3 on the results (I checked that there were no rows with sum = 0 [i.e. empty rows]).

I´ve also used one of the (very large) datasets but trying to reduce the computing effort (b 10 and t 20) just to see how it goes. While each iteration takes few seconds the 'sampling fit....' step is taking forever. are there any limits to dataset size? or any suggestions on which parameteer to use?

thanks, D

apascualgarcia commented 1 year ago

Same problem here: 1) the local model returns the likelihood but the likelihood of the full model and of the the observed data return nan; and 2) if I run the whole set of samples I do not get nan, it is only when I run the method on certain subsets of samples (not in all subsets). Communities were rarefied to prevent numerical issues with the Stirling matrix as indicated in other issues, and it was verified that there were no empty rows/columns.

Sample of results, whole set (354 samples):

25000,-257998.583919,-237381.254884,-229819.318754,16.050474,14.294355,10.423894,843,882,848
25010,-267659.461330,-238032.301216,-229768.663678,17.153922,14.349036,10.423894,815,891,848
(...)

Subset 1 (245 samples):

25000,-188104.474149,-169520.218121,-163530.597540,18.903732,15.501356,12.483412,605,669,664
25010,-204395.775564,-170329.017807,-163514.525032,17.485596,15.636775,12.483412,688,668,664
(...)

Subset 2 (90 samples):

25000,-58210.311474,**nan,nan**,5.869408,5.323496,5.519792,554,526,537
25010,-53700.553043,**nan,nan**,6.281922,5.334340,5.519792,486,530,537
(...)

Thanks,

Alberto