Error in NetConstruct for network comparison : "subscript out of bounds"

sarahkosh commented 4 months ago

Hi,

I'm attempting to compare networks from a variable with 2 groups in my 16s phyloseq object and I am getting a "subscript out of bounds" error message (full error message below).

Here is the code chunk:

# Network construction
phyob_early <- subset_samples(phyob, Embryogenesis == "Early")
phyob_late <- subset_samples(phyob, Embryogenesis == "Late")

n_early <- phyloseq::nsamples(phyob_early) #has 1 fewer sample than late

net_earlyVlate <- netConstruct(data = phyob_early, 
                           data2 = phyob_late,  
                           filtTax = "highestVar",
                           filtTaxPar = list(highestVar = 50), #50 nodes with highest var
                           filtSamp = "highestFreq",
                           filtSampPar = list(highestFreq = n_early), #lowest n samps 
                           measure = "spring",
                           measurePar = list(nlambda = 10, 
                                             rep.num = 10,
                                             Rmethod = "approx"),
                           normMethod = "none", 
                           zeroMethod = "none",
                           sparsMethod = "none", 
                           dissFunc = "signed",
                           verbose = 2,
                           seed = 127

Full error output:

Checking input arguments ... Done.
Data filtering ...
0 samples removed in data set 1.
1 samples removed in data set 2.
765 taxa removed in each data set.
35 taxa and 11 samples remaining in group 1.
35 taxa and 11 samples remaining in group 2.

Calculate 'spring' associations ... Done.

Calculate associations in group 2 ... Done.
Error in assoMat1[edgelist1[i, 1], edgelist1[i, 2]] : 
  subscript out of bounds
In addition: Warning messages:
1: 9 jobs had warning: "There are variables in the data that have only zeros or only the same values." 
2: In pulsar::pulsar(qdat, fun = fun, fargs = list(lambda = lambdaseq,  :
  Optimal lambda may be larger than the supplied values
3: 4 jobs had warning: "There are variables in the data that have only zeros or only the same values." 
4: In pulsar::pulsar(qdat, fun = fun, fargs = list(lambda = lambdaseq,  :
  Optimal lambda may be larger than the supplied values

Here are my phyloseq objects:

> phyob_early
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 800 taxa and 11 samples ]
sample_data() Sample Data:       [ 11 samples by 5 sample variables ]
tax_table()   Taxonomy Table:    [ 800 taxa by 6 taxonomic ranks ]
refseq()      DNAStringSet:      [ 800 reference sequences ]

> phyob_late
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 800 taxa and 12 samples ]
sample_data() Sample Data:       [ 12 samples by 5 sample variables ]
tax_table()   Taxonomy Table:    [ 800 taxa by 6 taxonomic ranks ]
refseq()      DNAStringSet:      [ 800 reference sequences ]

When run individually, networks are constructed (and heatmaps and plots, too) on phyob_early & phyob_late no problem:

net_spring_late <- netConstruct(phyob_late,
                                 filtTax = "highestFreq", 
                                 filtTaxPar = list(highestFreq = 50), #50 most freq ASVs
                                 filtSamp = "totalReads",
                                 filtSampPar = list(totalReads = 100),  # 100 reads
                                 measure = "spring", 
                                 measurePar = list(nlambda=40, 
                                                   rep.num=10,
                                                   Rmethod = "approx"), 
                                 normMethod = "none", #handled by SPRING
                                 zeroMethod = "none", #handled by SPRING
                                 sparsMethod = "none", #handled by SPRING
                                 dissFunc = "signed", 
                                 verbose = 2,
                                 seed = 124)

Output:

Checking input arguments ... Done.
Data filtering ...
750 taxa removed.
50 taxa and 12 samples remaining.

Calculate 'spring' associations ... Done.

I can usually google my way out of error messages, but this one has me stumped (I am previously unfamiliar with Network analysis). Apologies if this is a SPRING issue, but any insights would be much appreciated.

Cheers, Sarah

stefpeschel commented 4 months ago

Hi,

Generally, a sample size of 11 or 12 is really small. So, please consider that you won't get reliable association estimations with such a low sample size.

I assume, the error is caused by SPRING's subsampling process. As you might know, SPRING performs a random subsampling with a ratio of 80% (SPRING argument subsample.ratio). That means for your data, random subsamples with sample size 11*0.8 are randomly drawn and the association estimation process is repeated for each subset. These subsamples are probably so sparse that certain taxa contain only zeros, which causes problems. This is where the "There are variables in the data that have only zeros or only the same values." errors might come from.

I was actually able to reproduce the error with the American Gut data. It occurs if I use sub-samples with sample sizes equal to yours. I noticed that the "subscript out of bounds" error occurs because the estimated networks are completely empty. This case was actually catched by a check if the edge list equals NULL. For some reason, this does not work anymore. Probably due to a change in the igraph function. I will have to fix this so that you get a meaningful error.

Unfortunately, I was not able to produce non-empty networks with these low sample sizes. One difference from your single network to the joint one I noticed is the number of taxa. In the joint one, the filtering reduces the number of taxa to only 35. You could try setting the jointPrepro argument to TRUE so that the filtering is done for the joint data set. If it is FALSE (the default in your case), each set is filtered separately and the intersect of taxa is built at the end. There is a chance that this fixed the error for your data. Or you could set SPRING's threshold argument to a higher value to get a less sparse network. As very last option, you could use another association measure, which is not based on resampling. Proportionality for instance.

Best, Stefanie

sarahkosh commented 4 months ago

Hi Stefanie,

Thank you for your response and insights. It sounds like my sample sizes are too small to get meaningful results anyway, but for my own edification I tried a couple of your suggestions.

When jointPrepro = TRUE, I get the same error message. I've also tried adjusting Spring's threshold value with no luck.

Are the zeros in the data from the 50 (or in the example above, 35) most variable nodes/taxa between the two groups?

I guess what I'm asking is, if filtTaxPar = list(highestVar = 50), but only 35 taxa are left after this filtering, where are the zeros coming from? I was assuming these 35 taxa were non-zero in both groups because of the filtering.

Thanks again! Sarah

stefpeschel commented 4 months ago

You're right, the original data set does not contain any zeros after filtering. The zeros occur in the subsampling process. SPRING uses StARS for edge selection, which is a stability-based approach, where random subsamples of your data are drawn. For example, if ASV1 has non-zero values only in samples 8 and 9, and those samples are removed in the subsampling, then the total count of ASV1 in that particular subsample is zero.

But generally, SPRING would be able to handle subsamples, where certain ASVs are zero. Here, the main problem is, that no edges remain after the stability selection step. Maybe, the authors of SPRING would have a solution.

sarahkosh commented 4 months ago

Thanks for clarifying.

Sarah

stefpeschel / NetCoMi

Error in NetConstruct for network comparison : "subscript out of bounds" #118