meringlab / FlashWeave.jl

Inference of microbial interaction networks from large-scale heterogeneous abundance data
Other
70 stars 8 forks source link

How to extract the environmentally driven edges identified by FlashWeave? #32

Closed huizhen-yan closed 10 months ago

huizhen-yan commented 1 year ago

Hi, I'm stuck with the following two issues. Q1: Which edges in the FlashWeave results are environmentally driven? It appears that there is no clear indication in the learn_network() help or results. We can get the directed network by setting max_k=3, or univariate network (including direct and indirect edges) by setting max_k=0. In my opinion, the edges that do not appear in the direct network should include both spurious co-occurrence between microbes and environmentally-driven associations. However, I could not find any information on how to distinguish between these two parts in corresponding article and method documentation. Is it possible for FlashWeave to extract environmentally-driven edges?

Q2: FlashWeave runs normally when 'detailed' is not set, but I get the following error when setting 'detailed=true'. `julia> netw_results = learn_network(data_path, meta_path, sensitive=true, heterogeneous=false, alpha=0.001,max_k=3,detailed=true)

Can you help me? Thanks a lot.

Loading data

Normalizing

Removing variables with 0 variance (or equivalently 1 level) and samples with 0 reads -> no samples or variables discarded

Normalization

Removing meta variables with 0 variance (or equivalently 1 level) -> no samples or variables discarded

Learning interactions

Inferring network with FlashWeave - sensitive (conditional)

Run information:
sensitive - true
heterogeneous - false
max_k - 3
alpha - 0.001
sparse - false
workers - 1
OTUs - 42344
MVs - 18

Automatically setting 'n_obs_min' to 20 for enhanced reliability Computing univariate associations

Univariate degree stats: Summary Stats: Length: 42362 Missing Count: 0 Mean: 738.558425 Minimum: 3.000000 1st Quartile: 382.000000 Median: 586.000000 3rd Quartile: 855.000000 Maximum: 4754.000000

Starting conditioning search

Preparing workers..

Done. Starting inference..

Exception occurred on worker 1: MethodError: no method matching check_candidate!(::Int64, ::Int64, ::Matrix{Float32}, ::Vector{Int64}, ::OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}, ::FlashWeave.FzTestCond{Float32, FlashWeave.NoNz}, ::Int64, ::Float64, ::Int64, ::Int64, ::Int64, ::Int64, ::Dict{Int64, Tuple{Tuple{Int64, Vararg{Int64}}, FlashWeave.TestResult, Tuple{Int64, Float64}}}, ::Bool, ::Vector{Int32}, ::Char, ::Bool, ::OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}; detailed=true) Closest candidates are: check_candidate!(::Int64, ::Int64, ::AbstractMatrix{ElType}, ::Vector{Int64}, ::OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}, ::TestType, ::Integer, ::AbstractFloat, ::Integer, ::Integer, ::Integer, ::Integer, ::Dict{Int64, Tuple{Tuple{Int64, Vararg{Int64}}, FlashWeave.TestResult, Tuple{Int64, Float64}}}, ::Bool, ::Vector{DiscType}, ::Char, ::Bool, ::OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}; bnb, cut_test_branches) where {ElType<:Real, DiscType<:Integer, TestType<:FlashWeave.AbstractTest} at ~/.julia/packages/FlashWeave/UprmH/src/hiton.jl:80 got unsupported keyword argument "detailed" Stacktrace: [1] kwerr(::NamedTuple{(:detailed,), Tuple{Bool}}, ::Function, ::Int64, ::Int64, ::Matrix{Float32}, ::Vector{Int64}, ::OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}, ::FlashWeave.FzTestCond{Float32, FlashWeave.NoNz}, ::Int64, ::Float64, ::Int64, ::Int64, ::Int64, ::Int64, ::Dict{Int64, Tuple{Tuple{Int64, Vararg{Int64}}, FlashWeave.TestResult, Tuple{Int64, Float64}}}, ::Bool, ::Vector{Int32}, ::Char, ::Bool, ::OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}) @ Base ./error.jl:165 [2] hiton_backend(T::Int64, candidates::Vector{Int64}, data::Matrix{Float32}, test_obj::FlashWeave.FzTestCond{Float32, FlashWeave.NoNz}, max_k::Int64, alpha::Float64, hps::Int64, n_obs_min::Int64, max_tests::Int64, prev_accepted_dict::OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}, candidates_unchecked::Vector{Int64}, time_limit::Float64, start_time::Float64, debug::Int64, whitelist::Set{Int64}, blacklist::Set{Int64}, rej_dict::Dict{Int64, Tuple{Tuple{Int64, Vararg{Int64}}, FlashWeave.TestResult, Tuple{Int64, Float64}}}, track_rejections::Bool, z::Vector{Int32}, phase::Char; fast_elim::Bool, no_red_tests::Bool, support_dict::OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}, kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:detailed,), Tuple{Bool}}}) @ FlashWeave ~/.julia/packages/FlashWeave/UprmH/src/hiton.jl:138 [3] interleaving_phase(::Int64, ::Vararg{Any}; add_initial_candidate::Bool, univar_nbrs::OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}, kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:detailed,), Tuple{Bool}}}) @ FlashWeave ~/.julia/packages/FlashWeave/UprmH/src/hiton.jl:155 [4] si_HITON_PC(T::Int64, data::Matrix{Float32}, levels::Vector{Int32}, max_vals::Vector{Int32}, cor_mat::Matrix{Float32}; test_name::String, max_k::Int64, alpha::Float64, hps::Int64, n_obs_min::Int64, max_tests::Int64, fast_elim::Bool, no_red_tests::Bool, FDR::Bool, weight_type::String, whitelist::Set{Int64}, blacklist::Set{Int64}, univar_nbrs::OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}, prev_state::FlashWeave.HitonState{Int64}, debug::Int64, time_limit::Float64, track_rejections::Bool, cache_pcor::Bool, kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:detailed,), Tuple{Bool}}}) @ FlashWeave ~/.julia/packages/FlashWeave/UprmH/src/hiton.jl:338 [5] interleaved_worker(data::Matrix{Float32}, levels::Vector{Int32}, max_vals::Vector{Int32}, cor_mat::Matrix{Float32}, edge_rule::String, nonsparse_cond::Bool, shared_job_q::Distributed.RemoteChannel{Channel{Tuple}}, shared_result_q::Distributed.RemoteChannel{Channel{Tuple}}, GLL_args::Dict{Symbol, Any}) @ FlashWeave ~/.julia/packages/FlashWeave/UprmH/src/interleaved.jl:30 [6] (::FlashWeave.var"#122#126"{String, Bool, Matrix{Float32}, Vector{Int32}, Vector{Int32}, Matrix{Float32}, Dict{Symbol, Any}, Distributed.RemoteChannel{Channel{Tuple}}, Distributed.RemoteChannel{Channel{Tuple}}})() @ FlashWeave /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Distributed/src/macros.jl:83 [7] #invokelatest#2 @ ./essentials.jl:729 [inlined] [8] invokelatest @ ./essentials.jl:726 [inlined] [9] #153 @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Distributed/src/remotecall.jl:425 [inlined] [10] run_work_thunk(thunk::Distributed.var"#153#154"{FlashWeave.var"#122#126"{String, Bool, Matrix{Float32}, Vector{Int32}, Vector{Int32}, Matrix{Float32}, Dict{Symbol, Any}, Distributed.RemoteChannel{Channel{Tuple}}, Distributed.RemoteChannel{Channel{Tuple}}}, Tuple{}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, print_error::Bool) @ Distributed /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:70 [11] run_work_thunk(rv::Distributed.RemoteValue, thunk::Function) @ Distributed /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Distributed/src/process_messages.jl:79 [12] (::Distributed.var"#100#102"{Distributed.RemoteValue, Distributed.var"#153#154"{FlashWeave.var"#122#126"{String, Bool, Matrix{Float32}, Vector{Int32}, Vector{Int32}, Matrix{Float32}, Dict{Symbol, Any}, Distributed.RemoteChannel{Channel{Tuple}}, Distributed.RemoteChannel{Channel{Tuple}}}, Tuple{}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}})() @ Distributed ./task.jl:484

ERROR: "Interleaved error (see stacktrace above)" Stacktrace: [1] interleaved_backend(target_vars::Vector{Int64}, data::Matrix{Float32}, all_univar_nbrs::Dict{Int64, OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}}, levels::Vector{Int32}, max_vals::Vector{Int32}, cor_mat::Matrix{Float32}, GLL_args::Dict{Symbol, Any}; update_interval::Float64, variable_ids::Vector{String}, meta_variable_mask::Nothing, convergence_threshold::Float64, conv_check_start::Float64, conv_time_step::Float64, parallel::String, edge_rule::String, edge_merge_fun::typeof(FlashWeave.maxweight), nonsparse_cond::Bool, verbose::Bool, workers_local::Bool, feed_forward::Bool, kill_remote_workers::Bool) @ FlashWeave ~/.julia/packages/FlashWeave/UprmH/src/interleaved.jl:161 [2] infer_conditional_neighbors(target_vars::Vector{Int64}, data::Matrix{Float32}, all_univar_nbrs::Dict{Int64, OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}}, levels::Vector{Int32}, max_vals::Vector{Int32}, cor_mat::Matrix{Float32}, parallel::String, nonsparse_cond::Bool, recursive_pcor::Bool, cont_type::DataType, verbose::Bool, hiton_kwargs::Dict{Symbol, Any}, interleaved_kwargs::Dict{Symbol, Any}) @ FlashWeave ~/.julia/packages/FlashWeave/UprmH/src/learning.jl:148 [3] learn_graph_structure(target_vars::Vector{Int64}, data::Matrix{Float32}, all_univar_nbrs::Dict{Int64, OrderedCollections.OrderedDict{Int64, Tuple{Float64, Float64}}}, levels::Vector{Int32}, max_vals::Vector{Int32}, cor_mat::Matrix{Float32}, parallel::String, recursive_pcor::Bool, cont_type::DataType, time_limit::Float64, nonsparse_cond::Bool, verbose::Bool, track_rejections::Bool, hiton_kwargs::Dict{Symbol, Any}, interleaved_kwargs::Dict{Symbol, Any}) @ FlashWeave ~/.julia/packages/FlashWeave/UprmH/src/learning.jl:176 [4] LGL(data::Matrix{Float32}; test_name::String, max_k::Int64, alpha::Float64, hps::Int64, n_obs_min::Int64, max_tests::Int64, convergence_threshold::Float64, FDR::Bool, parallel::String, fast_elim::Bool, no_red_tests::Bool, weight_type::String, edge_rule::String, nonsparse_cond::Bool, verbose::Bool, update_interval::Float64, edge_merge_fun::typeof(FlashWeave.maxweight), tmp_folder::String, debug::Int64, time_limit::Float64, header::Vector{String}, meta_variable_mask::Nothing, dense_cor::Bool, recursive_pcor::Bool, cache_pcor::Bool, correct_reliable_only::Bool, feed_forward::Bool, track_rejections::Bool, all_univar_nbrs::Nothing, kill_remote_workers::Bool, workers_local::Bool, kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:detailed,), Tuple{Bool}}}) @ FlashWeave ~/.julia/packages/FlashWeave/UprmH/src/learning.jl:255 [5] macro expansion @ ./timing.jl:463 [inlined] [6] learn_network(data::Matrix{Any}; sensitive::Bool, heterogeneous::Bool, max_k::Int64, alpha::Float64, conv::Float64, header::Vector{String}, meta_mask::BitVector, feed_forward::Bool, fast_elim::Bool, normalize::Bool, track_rejections::Bool, verbose::Bool, transposed::Bool, prec::Int64, make_sparse::Bool, make_onehot::Bool, max_tests::Int64, hps::Int64, FDR::Bool, n_obs_min::Int64, cache_pcor::Bool, time_limit::Float64, update_interval::Float64, parallel_mode::String, extra_data::Nothing, share_data::Bool, experimental_kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:detailed,), Tuple{Bool}}}) @ FlashWeave ~/.julia/packages/FlashWeave/UprmH/src/learning.jl:559 [7] learn_network(data_path::String, meta_data_path::String; otu_data_key::String, otu_header_key::String, meta_data_key::String, meta_header_key::String, verbose::Bool, transposed::Bool, kwargs::Base.Pairs{Symbol, Real, NTuple{5, Symbol}, NamedTuple{(:sensitive, :heterogeneous, :alpha, :max_k, :detailed), Tuple{Bool, Bool, Float64, Int64, Bool}}}) @ FlashWeave ~/.julia/packages/FlashWeave/UprmH/src/learning.jl:347 [8] top-level scope @ REPL[39]:1

julia>`

jtackm commented 1 year ago

Hi Huizhen,

Thanks for the report!

Regarding Q1: You are correct: setting max_k=3 results in the inferred network only including direct associations (checked against up to three confounding variables). But these direct associations can also involve environmental factors: for instance, if a species directly depends on low pH or a particular habitat (independent of other species), this is useful information that FlashWeave tries to infer. There are currently no helper functions for filtering only otu-otu edges directly, but the network object has a meta_mask field that specifies, which variables are OTUs and which are MVs, which then together with the internally stored graph object can be used to filter for what you want. These are the functions I have been using internally:

import Graphs: ne, nv
using SimpleWeightedGraphs, FlashWeave

otu_ids(fw_res::FlashWeave.FWResult) = names(fw_res)[otu_indices(fw_res)]
mv_ids(fw_res::FlashWeave.FWResult) = names(fw_res)[mv_indices(fw_res)]
otu_indices(fw_res::FlashWeave.FWResult) = findall(.!meta_variable_mask(fw_res))
mv_indices(fw_res::FlashWeave.FWResult) = findall(meta_variable_mask(fw_res))

function otu_otu_edges(fw_res::FlashWeave.FWResult)
    oids = Set(otu_indices(fw_res))
    Iterators.filter(x->x.src in oids && x.dst in oids, SimpleWeightedGraphs.edges(graph(fw_res)))
end

function otu_mv_edges(fw_res::FlashWeave.FWResult)
    oids = Set(otu_indices(fw_res))
    Iterators.filter(x->begin
            src_is_oid = x.src in oids
            dst_is_oid = x.dst in oids
            xor(src_is_oid, dst_is_oid)
        end,
        SimpleWeightedGraphs.edges(graph(fw_res)))
end

function mv_mv_edges(fw_res::FlashWeave.FWResult)
    mvids = Set(mv_indices(fw_res))
    Iterators.filter(x->x.src in mvids && x.dst in mvids, SimpleWeightedGraphs.edges(graph(fw_res)))
end

ne_otu_otu(fw_res::FlashWeave.FWResult) = length([true for x in otu_otu_edges(fw_res)])
ne_otu_mv(fw_res::FlashWeave.FWResult) = length([true for x in otu_mv_edges(fw_res)])
ne_mv_mv(fw_res::FlashWeave.FWResult) = length([true for x in mv_mv_edges(fw_res)])
ne(fw_res::FlashWeave.FWResult) = SimpleWeightedGraphs.ne(graph(fw_res))
nv(fw_res::FlashWeave.FWResult) = SimpleWeightedGraphs.nv(graph(fw_res))

Perhaps they are useful to you? I may eventually add them to the package (not sure if they are intuitive enough?), but it would be good to at least highlight the role of the meta_mask and internal graph object in the documentation.

Regarding Q2: the "detailed" flag should be provided to save_network() (see README.md), not learn_network(). But the error message could be more clear, sorry about that.

Hope this helps, Janko

huizhen-yan commented 9 months ago

Thank you for your kind reply. It's a pity that I didn't see your reply in time. The function you shared here makes the result generated by FlashWeave much clearer. I removed edges dependent on environmental factors in R, only retaining otu-otu edges for network topological properties statistics. The degree distribution of my four networks closely resembles the Poisson distribution (see attached figure). This is similar to Erdos Renyi random networks (generated by erdos.renyi.game() function in R package "igraph"), which do not follow the general rules of natural microbial interaction networks. This result really bothered me. According to box 1 (network types) in Layeghifard et al. (2017, Trends in Microbiology), and other studies in similar habitats (seawater), the distribution of nodes’ degrees in natural networks rarely follows a Poisson distribution. Could you please provide some experience or explanation about this phenomenon?

image
jtackm commented 9 months ago

Sorry, but Github is not the right format to discuss scientific questions. Feel free to drop me an email though!