meringlab / FlashWeave.jl

Inference of microbial interaction networks from large-scale heterogeneous abundance data
Other
70 stars 8 forks source link

insufficient number of observations #34

Closed cmandreani closed 9 months ago

cmandreani commented 9 months ago

Hi Janko,

I've been unable to get anything different from "Automatically setting 'n_obs_min' to 20 for enhanced reliability ERROR: Dataset has an insufficient number of observations, need at least 20 ('n_obs_min') for reliable tests" when running:

julia> data_path = "flashweave_abundance.csv"
julia> meta_data_path = "flashweave_metadata.csv"
julia> netw_results = learn_network(data_path, meta_data_path, sensitive=true, heterogeneous=false)

I managed to address the "Try using a smaller 'max_k' parameter (at the cost of higher numbers of indirect associations)" message by setting max_k=0, but I'm not sure what I would be losing if I stick to the univariate mode.

I checked the primary recomendations in other issues with:

]up Flashweave
]test FlashWeave

and confirmed that I'm working with the latest version of the tool and that tests passed.

My dataset is small consisting of six samples (rows in both files) with 47 OTUs (columns in data_path) and 17 environmental measurements (columns in meta_data_path, can be integer, float, or categorical).

I tried modifying booleans of 'heterogeneous', 'sensitive', and 'header' with no better outcomes. I also tried reducing n_obs_min down to 6 (number of rows ) and to -1 (for automatical threshold choice) but it returns the same error.

Would you give me a hand on which parameter I could look into to overcome this issue?

Thanks, constanza

jtackm commented 9 months ago

Hi Constanza,

Disentangling direct and indirect associations in such small data sets is unfortunately unreliable. I suggest to stick to the univariate mode (as you did, ideally with sensitive=true and heterogeneous=false) or increase the number of samples if possible. The trade-off is that you may see confounded associations caused by shared habitat preferences or similar. But if you have multiple environments in this already power-starved dataset, I would generally re-consider if co-occurrence analysis is the right tool.

cmandreani commented 9 months ago

Hi, thanks so much for your answer. The small number of samples has been an obstacule all along the way. In any case, I'll try to include the tool in future studies. It run smoothly and was easily installed :) Cheers.

jtackm commented 9 months ago

Thanks for the kind words, all the best for the project!