Closed ld9866 closed 10 months ago
Yes, that's right-- at the moment the tool is limited to 65534 path groups (speak "samples" or "taxa"). I did not find it likely that there are data sets with more distinct samples/taxa out there right now. How many samples does your data set have?
Typically, you want to group your paths into samples or haplotypes, but this requires that path names adhere to the PanSN naming scheme. Then, you can simply group by sample (-S
) or haplotype (-H
)
Oh, and if your paths are not PanSN compatible, you can still do the grouping by hand, by specifying a path-to-group mapping with -g
Thank you for getting back to me so quickly. In fact, we only have 27 samples, and the genome size of each sample is 2.5G, so it should not be a problem for human pan-genome to visualize our data. We used minigraph-cactus for pan-genome construction and then used vg to convert gfa1.1 format for visual analysis, I would like to ask how we should conduct quality control or other operations to complete the visualization. Best yours.
Ok, then this means that you need to group the paths by samples (-S
) or haplotypes (-H
). Regarding quality control, I think panacus
is a good starting point, here is my suggestion:
RUST_LOG=info panacus histgrowth -t4 -l 1,2,1,1,1 -q 0,0,1,0.5,0.1 -H -c all -a -o html test.giffa2.1.0.gfa > test.giffa2.histgrowth.all.html
RUST_LOG=info panacus table -t4 -H -c node test.giffa2.1.0.gfa > test.giffa2.coverage.node.tsv
If you have further questions on QC of your pangenome graph, please email me at daniel.doerr@hhu.de
OK!I will send the detailed information to your email for consultation! With best wishes
Dear developer: We are conducting the collection test of our real data according to the example data, but we have encountered some problems and hope to get your help. The errors are as follows. Best day!
step1 is ok
grep '^P' test.giffa2.1.0.gfa | cut -f2 | grep -ve 'refernece' > test.giffa2.paths.haplotypes.txt
step2 is erro
RUST_LOG=info /home/test/Software/panacus-0.2.3_linux_x86_64/bin/panacus histgrowth -t 4 -l 1,2,1,1,1 -q 0,0,1,0.5,0.1 -S -a -s test.giffa2.paths.haplotypes.txt test.giffa2.1.0.gfa > test.giffa2.histgrowth.node.tsv
erro
[2023-11-30T00:41:26Z INFO panacus::cli] running panacus on 4 threads [2023-11-30T00:41:26Z INFO panacus::cli] constructing indexes for node/edge IDs, node lengths, and P/W lines.. [2023-11-30T00:43:19Z INFO panacus::cli] ..done; found 383935 paths/walks and 174028496 nodes [2023-11-30T00:43:19Z INFO panacus::cli] loading data from group / subset / exclude files [2023-11-30T00:43:19Z INFO panacus::abacus] loading coordinates from pig.giffa2.paths.haplotypes.txt Error: Custom { kind: Unsupported, error: "data has 383917 path groups, but command is not supported for more than 65534" }