Open johanneswerner opened 4 years ago
Hello, I have posted this issue seven days ago, no activity so far. Could you tell me if this is going to be fixed any time soon? Any response would be highly appreciated. Thank you.
Hi, I don't know what will happen with this project in the future, but I will provide you with a fix. If I can't get it working within metaerg, I will give you an R script to extract this columns from the gff/tbl file. Best wishes, Lukas Jansen
Ok, I found the problem. The gene2pathways
hash(map) is always empty, the predictKegg/Metacyc functions have a local variable of the same name. Only the master.tsv file uses this hash. The pathways are added to the feature hash (f
). This hash is used for the other files.
Replace this in bin/output_reports.pl
:
https://github.com/xiaoli-dong/metaerg/blob/7d6f785dab2b776db0209f35eb6d974a67df1290/bin/output_reports.pl#L149-L156
With
https://github.com/xiaoli-dong/metaerg/blob/0f0d46995b62d12417c63ef4937d1b10ef12ec1b/bin/output_reports.pl#L143-L144
oh cool, thank you very much for your help, testing it now. :-)
almost solved: this is the current output (only showing the columns 2, 8 and 9):
feature_id kegg_pathways metacyc_pathways
BS|00001
BS|00002 00260
BS|00003 00650;00640;00280;00071;00632;03320;00930;01040;00410;00380;00310;00592
BS|00004 00240;00910;00252;00280;00020;00260;00251;00620;00010 PHOTOALL-PWY;PWY-101
How can the IDs be translated?
Good morning,
The MetaCyc pathways don't really have names, at least easily available.
Two ideas:
1) The hack:
Instead of kegg_pathway_id
use kegg_pathway_name
in output_reports.pl
2) R with minpath data
#!/usr/bin/env Rscript
readMaster = function(infile) {
require(data.table)
master = fread(master_tsv_path, fill = T, sep = "\t")
# Weird col V17
if ("V17" %in% colnames(master) && master[, all(is.na(V17))])
master[, V17 := NULL]
master
}
translateIds = function(tolookup, lookuptable, sep = ";") {
require(data.table)
unnested = data.table(elements = strsplit(tolookup, sep),
row = seq_along(tolookup))
unnested = unnested[, .(elements = unlist(elements)), by = row]
res = merge(
unnested,
lookuptable,
by.x = "elements",
by.y = colnames(lookuptable)[[1]],
all.x = T
)
res = res[, .(collapsed = paste0(get(colnames(lookuptable)[[2]]), collapse = sep)), by =
row]
setorder(res, "row")
res$collapsed
}
readMinPath = function(infile) {
require(data.table)
minpath = fread(infile,
header = F,
fill = T,
sep = "")
# https://omics.informatics.indiana.edu/MinPath/output-readme.txt
minpath_regex = "path (\\d+) (.+) naive (0|1) minpath (0|1) fam0 (\\d+) fam-found (\\d+) name (.+)"
minpath_colnames = c(
"pathway_number",
"reconstruction",
"naive_recon",
"minpath_recon",
"total_fam_count",
"hit_fam_count",
"pathway_name"
)
minpath[, V1 := sub(minpath_regex, "\\1;\\2;\\3;\\4;\\5;\\6;\\7", V1)]
minpath[, (minpath_colnames) := tstrsplit(V1, ";", fixed = T)]
minpath[, V1 := NULL]
minpath
}
master_tsv_path = "master.tsv.txt"
result_path = "translated.tsv"
minpath_path = "cds.gene2ko.minpath.txt"
minpath = readMinPath(minpath_path)
lookuptable = minpath[, .(pathway_number, pathway_name)]
master = readMaster(master_tsv_path)
master[, kegg_pathways := translateIds(kegg_pathways, lookuptable)]
fwrite(master, result_path)
thank you for your help. I used the first hack and can confirm that it worked for me. I changed line 143 as follows:
my @kegg_pathways = $f->get_tag_values("kegg_pathway_name") if $f->has_tag("kegg_pathway_name");
This issue is possibly the same as in #4 but since it has not received any attention, I will open this issue.
After fixing my workflow as described in #10, both with and without docker the workflow runs without errors.
However, the
master.tsv.txt
file has no entry in the columns kegg_pathway or metacyc_pathway, but those are annotated inmaster.gff.txt
(example for one entry below)and in
master.tbl.txt
Any idea why this is the case and how this can be fixed?
Thank you very much.