Closed asylvz closed 10 months ago
Thanks for pointing this out, I fixed the bug, it was there because my assumption was that all the nodes would have an SN tag. So now I just check if they do and count the SN tag values for the one that do have it.
Moreover, as it was originally written by Tobias, if the SN tags don't reflect the hard-coded names (chr1, chr2, chr3, etc...), then you need to provide these names, otherwise, when you run it, you'll wait a lot of time before getting this message
gaftools order_gfa /gpfs/project/projects/medbioinf/users/asoylev/hgsvc3/ref_graph/hprc-v1.1-mc-chm13.gfa --outdir output/
Reading /gpfs/project/projects/medbioinf/users/asoylev/hgsvc3/ref_graph/hprc-v1.1-mc-chm13.gfa
The graph has:
The graph has 92879580 Nodes and 128165765 Edges
Connected components: 25
Chromosome set mismatch:
Expected: chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY,chrM
Found: CHM13#chr1,CHM13#chr10,CHM13#chr11,CHM13#chr12,CHM13#chr13,CHM13#chr14,CHM13#chr15,CHM13#chr16,CHM13#chr17,CHM13#chr18,CHM13#chr19,CHM13#chr2,CHM13#chr20,CHM13#chr21,CHM13#chr22,CHM13#chr3,CHM13#chr4,CHM13#chr5,CHM13#chr6,CHM13#chr7,CHM13#chr8,CHM13#chr9,CHM13#chrM,CHM13#chrX,CHM13#chrY
So you need to provide the correct chromosome names as present in the SN tags of that graph. Maybe we need to find a better way to do this later.
Can't you do a fast skip through the file to get the names. The "Found" line in your error message is what you actually need, right?
I don't think I follow what you mean by fast skip here. I fixed the error you got because I was checking for the SN tag and some nodes didn't have any. Now I check if there's an SN tag, and if there's I count it for the component to name the component
I just made another push, maybe I understood what you meant. But now, if the user doesn't provide anything for --chromosome_order
, the program will run and just output whatever components are found and use the majority-vote SN tag to name that component.
If the user provides only a subset of the chromosomes (e.g. chr1,chr22) for --chromosome_order
argument, then only those chromosomes will be outputted.
If the user provides a chromosome name that doesn't exist in the SN tags, or at least wasn't a majority vote, then the program will abort.
I will close this bug, as I ran it now and all seems to work :)
I'm getting the error below when with minigraph cactus:
gaftools order_gfa /gpfs/project/projects/medbioinf/users/asoylev/hgsvc3/ref_graph/hprc-v1.1-mc-chm13.gfa