Open hangsuUNC opened 1 year ago
Dear @hangsuUNC,
In color mode, it is guaranteed that each k-mer gets at least one color so if you have k-mers or unitigs without colors, it is a bug from either Bifrost or Pyfrost (which is developed independently from Bifrost). I am not familiar with the Pyfrost syntax but I had a quick look on the Pyfrost README and I saw that to iterate over pairs of k-mers and colors, the syntax is:
for n, data in g.nodes(data=True):
for c in data['colors']:
print("Node", n, "has color", c)
From your code, it seems that data=True
is not used and that you don't access colors per k-mer but per unitig instead which I am not sure is possible. Could you look into that first?
Thanks, Guillaume
Hi Guillaume,
Thanks for your reply! I tried: ` nodes_info = []
for n, data in g.nodes(data=True):
num = 0
for c in data['colors']:
num += 1
nodes_info.append(["Node", n, "has color", c])
`
Got an error message:
Not sure what does that mean... Will contact pyfrost author later!
Thank you!
Hang
Unfortunately, I don't know why this fails. I think that contacting first the Pyfrost author is a good idea. Make you link this issue in the one you are going to create and I'll assist with any Bifrost related test or bug.
Thank you so much, Guillaume! Will do!
Best,
Hang
Hi @hangsuUNC,
I'll close this for now since it is unclear at the moment if the issue is with Pyfrost or Bifrost. Don't hesitate to reopen or link to this issue if there is some progress on the matter.
Guillaume
Hi Guillaume,
Sorry for the delay of response! I contacted the pyfrost author @lrvdijk and examined different versions of Bifrost and its output graph. Lucas created a test C++ program using the Bifrost API directly (so not using the pyfrost Python library), and it still fails with kmers of no colors. It sounds like a Bifrost color matrix issue instead of the python library issue.
Could you please help check the color matrix for bifrost graph? If there is any additional information you need to test, please let us know!
Thanks a lot for your help!
Best,
Hang
Hi @hangsuUNC,
I'll close this for now since it is unclear at the moment if the issue is with Pyfrost or Bifrost. Don't hesitate to reopen or link to this issue if there is some progress on the matter.
Guillaume
For reference, here's the C++ test program (using Catch2 test framework, but you get the idea):
#include <iostream>
#include <unordered_set>
#include <catch2/catch.hpp>
#include <ColoredCDBG.hpp>
#include <Kmer.hpp>
TEST_CASE("Test unitig color data", "[unitig_color_data]") {
CCDBG_Build_opt opt;
opt.filename_graph_in = "data/MT_graph_Bfrost_graph.gfa";
opt.filename_colors_in = "data/MT_graph_Bfrost_graph.bfg_colors";
ColoredCDBG<> ccdbg(opt.k, opt.g);
ccdbg.read(opt.filename_graph_in, opt.filename_colors_in, 2);
auto total_num_colors = ccdbg.getColorNames().size();
ofstream anchors;
anchors.open("data/anchors.txt");
for(auto const& um : ccdbg) {
auto colorset = um.getData()->getUnitigColors(um);
std::cout << "Testing colorset of " << um.getMappedHead().toString() << std::endl;
REQUIRE(colorset != nullptr);
std::unordered_map<size_t, size_t> colors_per_kmer{};
for(auto it = colorset->begin(um); it != colorset->end(); ++it) {
colors_per_kmer.emplace(it.getKmerPosition(), 0).first->second++;
}
for(auto const& p : colors_per_kmer) {
if(p.second == total_num_colors) {
anchors << um.getUnitigKmer(p.first).toString() << std::endl;
}
}
}
anchors.close();
}
Graph is created with Bifrost <1.2 and the test is also run with the same pre-1.2 Bifrost version.
For many k-mers, the colorset pointer is fine, but for some it's not.
(When using the Python wrapper, the same k-mer fails too).
Hi @hangsuUNC,
I am reopening the issue. Would it be also possible for you to share the input data used to build the graph as well as the exact Bifrost version/commit used? Thanks!
Guillaume
Hi Guillaume,
Thanks for your reply! Here is the construction command:
Bifrost build -t ~{num_threads} -k ~{kmersize} -i -d -c -s ~{sep=" -s " fas} -r ~{ref} -o ~{outputpref}_Bfrost_graph
The docker I use is listed here: 1) hangsuunc/bifrost:v1 (Bifrost 1.2.0 ) 2) us-central1-docker.pkg.dev/broad-dsp-lrma/fusilli/fusilli:devel (Bifrost 1.0.6.5)
All of the outputs are of the same issue...
Here is the input file merged into a single fasta: all.fasta.gz
Thanks again for your help!
Best,
Hang
Hi,
Thanks for this wonderful tool! I'm writing to ask a question about the color of unitigs. I found about 3% of the unitigs created from Bifrost is of no colors. Is this because these unitigs do not exist in any of the samples (randome recombinations between a set of kmers)? I used pyfrost to load the graph and the color files to do the analysis
Here is the command: Bifrost build -t 16 -k 31 -c -s -r -o
Pyfrost:
g = pyfrost.load()
nodelist = list(g.nodes)
no_colors = 0
for node in nodelist:
try:
colors = g.nodes[node]['colors']
except:
no_colors += 1
Results:
Thanks for your help!
Best,
Hang