Estimating relative abundances: AssertionError

apascualgarcia commented 4 years ago

Dear @tyjo,

I am trying to estimate the relative abundances for a large dataset (~4500 otus) and I get the following error:

Running fileIn datasets/count_table.csv Output directory datasets/luminate Estimating relative abundances... Traceback (most recent call last): File "main.py", line 321, in estimate(Y, U, T, IDs, denom, otu_table, output_dir) File "main.py", line 166, in estimate model.optimize(verbose=True) File "/src/noisy_vmlds.py", line 314, in optimize self.update_variance() File "/src/noisy_vmlds.py", line 885, in update_variance self.lambda_AA[i], self.lambda_BB[i] = self.compute_lambda_blk_tridiag(self.T[i], self.V[i]) File "/src/noisy_vmlds.py", line 908, in compute_lambda_blk_tridiag assert np.all(np.diag(s2_inv) > 0), np.diag(s2_inv) AssertionError: [-1.92669653e+13 -1.99035018e+13 4.42507458e+11 4.91200427e+11 8.46773685e+12 1.32601284e+12 -5.31715979e+12 4.27400593e+10 1.45309553e+11 -2.14181242e+11 -6.78595452e+10 -2.81937895e+11 8.31248779e+10 -9.37491008e+10 -5.53891714e+12 4.16841584e+12 (large array of values)

Aggregating these otus into higher taxa (~400) runs properly, is it any limitation in the number of otus or in the table sparseness? In the example, all otus have at least 100 reads across all samples (36).

Cheers

tyjo commented 4 years ago

When you say each otu has at least 100 reads across all samples, do you mean for the ~4500 otus or the ~400 otus? Is that 100 reads per observation/sample, or 100 reads total?

In general, the ability to reconstruct a trajectory is affected by the level of sparseness. The larger the proportion of zero entries, the less information available to reconstruct a trajectory. In the worst case, an OTU is never observed (has all zero counts), and there is no information available. I would expect the method to work with intermediate levels of sparseness (e.g. fewer than 30-40% zeros), but have difficulty at high levels of sparsity (e.g. 5-10%).

apascualgarcia commented 4 years ago

Thanks for the fast answer,

Each of the ~4500 otu (actually exact sequence variant) has at least 100 reads total across all samples. The table table has 70% zeros so that may be the case:

Num samples: 36 Num observations: 4.438 Total count: 9.891.726 Table density (fraction of non-zero values): 0.296

Considering OTUs (97% SI) increases substantially the density (and it works):

Num samples: 36 Num observations: 452 Total count: 7.933.920 Table density (fraction of non-zero values): 0.586

Perhaps it would be worth documenting this limitation? I am happy to send you both tables if it is of any help.

tyjo commented 4 years ago

Thanks for sending the tables.

I added a warning message about data sparsity. While it doesn't resolve the error, it should make it easier to diagnose.

apascualgarcia commented 4 years ago

Thanks for taking the time to identify the problem!

tyjo / luminate

Estimating relative abundances: AssertionError #3