meringlab / FlashWeave.jl

Inference of microbial interaction networks from large-scale heterogeneous abundance data
Other
70 stars 8 forks source link

learn_network defaults #28

Open nick-youngblut opened 3 years ago

nick-youngblut commented 3 years ago

The learn_network doc shows:

help?> learn_network
search: learn_network

  learn_network(data_path::AbstractString, meta_data_path::AbstractString) -> FWResult{<:Integer}

  Works like learn_network(data::AbstractArray{<:Real, 2}), but instead of a data
  matrix takes file paths to an OTU table and optionally a meta data table as an
  input.

    •  data_path - path to a file storing an OTU count matrix (and JLD2 meta
       data)

    •  meta_data_path - optional path to a file with meta data

    •  *_key - HDF5 keys to access data sets with OTU counts, Meta variables and
       variable names in a JLD2 file. If a data item is absent the corresponding
       key should be 'nothing'. See '?load_data' for additional information.

    •  verbose - print progress information

    •  transposed - if true, rows of data are variables and columns are samples

    •  kwargs... - additional keyword arguments passed to
       learn_network(data::AbstractArray{<:Real, 2})

  ────────────────────────────────────────────────────────────────────────────────────

  learn_network(data::AbstractArray{<:Real, 2}) -> FWResult{<:Integer}

  Learn an interaction network from a data matrix (including OTUs and optionally meta
  variables).

    •  data - data matrix with information on OTU counts and (optionally) meta
       variables

    •  header - names of variable columns in data

    •  meta_mask - true/false mask indicating which variables are meta variables

  Algorithmic parameters

    •  heterogeneous - enable heterogeneous mode for multi-habitat or -protocol
       data with at least thousands of samples (FlashWeaveHE)

    •  sensitive - enable fine-grained association prediction (FlashWeave-S,
       FlashWeaveHE-S), sensitive=false results in the fast modes (FlashWeave-F,
       FlashWeaveHE-F)

    •  max_k - maximum size of conditioning sets, high values can lead to the
       removal of more spurious edgens, but may also strongly increase runtime
       and reduce statistical power. max_k=0 results in no conditioning
       (univariate mode)

    •  alpha - statistical significance threshold at which individual edges are
       accepted

    •  conv - convergence threshold, e.g. if conv=0.01 assume convergence if the
       number of edges increased by only 1% after 100% more runtime (checked in
       intervals)

    •  feed_forward - enable feed-forward heuristic

    •  fast_elim - enable fast-elimiation heuristic

    •  max_tests - maximum number of conditional tests that is performed on a
       variable pair before association is assumed

    •  hps - reliability criterion for statistical tests when sensitive=false

    •  FDR - perform False Discovery Rate correction (Benjamini-Hochberg method)
       on pairwise associations

    •  n_obs_min - don't compute associations between variables having less
       reliable samples (non-zero samples if heterogeneous=true) than this
       number. -1: automatically choose a threshold.

    •  time_limit - if feed-forward heuristic is active, determines the interval
       (seconds) at which neighborhood information is updated

  General parameters

    •  normalize - automatically choose and perform data normalization method
       (based on sensitive and heterogeneous)

    •  track_rejections - store for each discarded edge, which variable set lead
       to its exclusion (can be memory intense for large networks)

    •  verbose - print progress information

    •  transposed - if true, rows of data are variables and columns are samples

    •  prec - precision in bits to use for calculations (16, 32, 64 or 128)

    •  make_sparse - use a sparse data representation (should be left at true in
       almost all cases)

    •  make_onehot - create one-hot encodings for meta data variables with more
       than two categories (should be left at true in almost all cases)

    •  update_interval - if verbose=true, determines the interval (seconds) at
       which network stat updates are printed

What are the defaults for these parameters (eg., prec)?

jtackm commented 3 years ago

Hi Nick! Good point, I will look into adding these to the docs. Currently one would have to look directly at the method definitions in learning.jl (e.g. prec defaults to 32).