stefpeschel / NetCoMi

Network construction, analysis, and comparison for microbial compositional data
GNU General Public License v3.0
143 stars 24 forks source link

Problem with the targets IDs in export to gephi #121

Open JEALabmio opened 3 months ago

JEALabmio commented 3 months ago

Hi!, thank you so much fo the package :). I'm encountering an issue when exporting the networks to Gephi after analyzing them using the Spearman measure. While everything seems to be functioning properly, I've noticed a discrepancy when exporting the node file for Gephi. The source IDs in the node file do not correspond to the IDs for the target, for example, in the edge file, the initial correlation appears as follows:

1,4,Undirected,0.22

However, upon inspecting the node file, ID number 1 is associated with the label Phylo001, and ID number 4 is labeled as Phylo004. Yet, when examining the Target table, ID number 4 corresponds to Phylo014. Consequently, when visualizing the graphs in Gephi using this node file, the correlations are inaccurately represented because the true correlation should be between Phylo001 and Phylo014, not Phylo001 and Phylo004.

How can I rectify this issue during the data export process to ensure that the node file contains correct IDs for both the Source and Target?

Thanks!

stefpeschel commented 3 months ago

Hi! Sorry for the late reply. Thanks for pointing this out! I had a look at some examples and indeed the IDs weren't assigned correctly (for some reason only in some cases). I have just updated the example on how to export to gephi and hope it does the job correctly now. Could you please check if the IDs and node labels are correctly exported?

JEALabmio commented 3 months ago

No problem!. Thank you so much for your response!. I tried again with the new example, but it still gives me incorrect IDs. Could it be that I'm doing something wrong in one of the steps?. This is the pipeline I have been using. Also i attached a photo with the nodes and edges, where the IDs for the target goes up to 51, when in the nodes file I only have 45.

Phyloseq object from mothur

mothur_shared_file = "E:/Escritorio/NetWorks/Network _V1V3_PI/V1V3comb.tx.1.subsample_PI.shared" mothur_constaxonomy_file = "E:/Escritorio/NetWorks/Network _V1V3_PI/V1V3.tx.1.cons.taxonomy"

V1V3_PI_phy <- import_mothur(mothur_list_file = NULL, mothur_group_file = NULL, mothur_tree_file = NULL, cutoff = NULL,mothur_shared_file, mothur_constaxonomy_file = mothur_constaxonomy_file, parseFunction = parse_taxonomy_default)

Net with Spearman correlation

net_spear <- netConstruct(V1V3_PI_phy,
measure = "spearman", normMethod = "clr", filtTax = "numbSamp", filtTaxPar = list(numbSamp = 23), zeroMethod = "multRepl", sparsMethod = "threshold", thresh = 0.5, dissFunc = "unsigned", verbose = 3)

props_spear <- netAnalyze(net_spear, clustMethod = "cluster_fast_greedy")

For Gephi, we have to generate an edge list with IDs.

The corresponding labels (and also further node features) are stored as node list.

Create edge object from the edge list exported by netConstruct()

edges <- dplyr::select(net_spear$edgelist1, v1, v2)

Add Source and Target variables (as IDs)

edges$Source <- as.numeric(factor(edges$v1)) edges$Target <- as.numeric(factor(edges$v2)) edges$Type <- "Undirected" edges$Weight <- net_spear$edgelist1$adja

nodes <- unique(edges[,c('v1','Source')]) colnames(nodes) <- c("Label", "Id")

Add category with clusters (can be used as node colors in Gephi)

nodes$Category <- props_spear$clustering$clust1[nodes$Label]

edges <- dplyr::select(edges, Source, Target, Type, Weight)

write.csv(nodes, file = "nodes.csv", row.names = FALSE) write.csv(edges, file = "edges.csv", row.names = FALSE)

Nodes-edges