meringlab / FlashWeave.jl

Inference of microbial interaction networks from large-scale heterogeneous abundance data
Other
70 stars 8 forks source link

insufficient number of observations error #37

Closed ereyred closed 3 months ago

ereyred commented 4 months ago

Hello, I'm testing FlashWeave with a small subsection of my data, with two samples with over 400 OTUs, plus 8 metadata variables. I keep getting the error "ERROR: Dataset has an insufficient number of observations, need at least 20 ('n_obs_min') for reliable tests. Try using a smaller 'max_k' parameter (at the cost of higher numbers of indirect associations)." My Run information is: sensitive - true heterogeneous - false max_k - 3 alpha - 0.01 sparse - false workers - 1 OTUs - 2 MVs - 8 So it says I have 2 OTUs instead of 2 samples and hundreds of OTUs. I tried transposed=true, I tried manually transposing my data, plus adjusting the max_k and n_obs_min it suggests changing, but I keep getting the same error. I generated the relative OTU counts table with mOTUs2. Can someone help me please? Thanks so much. E

jtackm commented 4 months ago

FlashWeave automatically applies two filters: one that imposes a minimum number of observations (n_obs_min, see the error message) that can be altered by setting n_obs_min in learn_network() (i.e. n_obs = 0 to turn this filter off). But as a second filter, FlashWeave removes all variables with zero variance (e.g. all are 0, all are 1, ...) since you can't infer interactions for these. The fewer samples you have (two in your case, which is very low), the higher the likelihood of this happening.

ereyred commented 4 months ago

Ok, thanks for your quick reply! I will try with more samples. I have another question: can FlashWeave also deal with viral abundance data? I have prokaryotic, eukaryotic and viral abundance data from the same environments but calculated via three different methods. Since I don't really care about absolute OTU abundance, just the relative interactions between and within the three groups, I thought using OTU abundances from three methods could work. But it says FlashWeave is made for microbial abundance, dyou think it would also work with viral? Thanks so much! E

jtackm commented 3 months ago

Hi again. To your first question: FlashWeave hasn't been benchmarked with viral data, but as long as you're dealing with compositional abundances (which viral sequencing data are) I don't see issues. An important note when using data from multiple sequencing experiments of the same samples: these have to be normalized independently, FlashWeave has a feature for that which is unfortunately not yet widely documented. You can provide a list of paths to multiple data files to learn_network() as the first argument: i.e. learn_network([proka_otu_table_path, euka_otu_table_path, vir_otu_table_path], meta_path, <other args..>) to achieve this. Just make sure the files are aligned (e.g. row 1 corresponds to the same sample across files, same for row 2, ..).