Then, I set GRCh38 as a refernce for the VCF outputted once the pangenome has been assembled. My plan is to run PanGenIe and benchmark HG005 (an Asian sample not present in the pangenome) against the graph. After getting the tool to work, at first I faced an error related to "un-phased" haplotypes, which sounded weird to me as all genome assemblies have been done using both HiFi & Hi-C data.
So, at a closer look, what was happening related to the structure for the VCF file. It splits each individual into two columns, one for each of the two related haplotypes — see image below
My first approach has been to merge those columns for each one of the five individuals, separating them by "|" with the following command:
This - I guess - tricked PanGenIe to work just fine; the problem is that the tool returned an empty VCF for the HG005 sample... Is there anyway I can fix this?
Upon discussing this problem with other people, Glenn kindly addressed me to the links below
I had a look at both applications, and at how they are used in the context of the HPRCyear1 repository.
However, in my case the approach I followed is probably more basic and straightforward — mainly because I was not aware of many of the details and considerations to be taken into account. For instance, I haven't merged GRCh38 and CHM13 and removed unplaced contigs from the first. Therefore, I was wondering is there still a chance to get my VCF to work with PanGenIe, running a specific command of one (or both) of those applications, which would render it "accessible" for the tool itself?
P.S. I can attach a screenshot of the file after the awk command I used if deemed useful. Also, I already made sure the headers/contigs in the reference genome I fed to PanGenIe, and the names/contigs in the #CHROM column of the VCF are the same. One thing I'm not aware of is whether the length of the contigs names in the #CHROM column somehow affects the process, or either the "#" characters cause some issues, even though I don't think it is the case
Let me know (and sorry for the long message), thanks!
Hi there,
I generated a graph for five human individuals with the following command:
Then, I set GRCh38 as a refernce for the VCF outputted once the pangenome has been assembled. My plan is to run
PanGenIe
and benchmark HG005 (an Asian sample not present in the pangenome) against the graph. After getting the tool to work, at first I faced an error related to "un-phased" haplotypes, which sounded weird to me as all genome assemblies have been done using both HiFi & Hi-C data.So, at a closer look, what was happening related to the structure for the VCF file. It splits each individual into two columns, one for each of the two related haplotypes — see image below My first approach has been to merge those columns for each one of the five individuals, separating them by "|" with the following command:
This - I guess - tricked
PanGenIe
to work just fine; the problem is that the tool returned an empty VCF for the HG005 sample... Is there anyway I can fix this?Upon discussing this problem with other people, Glenn kindly addressed me to the links below
I had a look at both applications, and at how they are used in the context of the HPRCyear1 repository.
However, in my case the approach I followed is probably more basic and straightforward — mainly because I was not aware of many of the details and considerations to be taken into account. For instance, I haven't merged GRCh38 and CHM13 and removed unplaced contigs from the first. Therefore, I was wondering is there still a chance to get my VCF to work with
PanGenIe
, running a specific command of one (or both) of those applications, which would render it "accessible" for the tool itself?P.S. I can attach a screenshot of the file after the
awk
command I used if deemed useful. Also, I already made sure the headers/contigs in the reference genome I fed toPanGenIe
, and the names/contigs in the #CHROM column of the VCF are the same. One thing I'm not aware of is whether the length of the contigs names in the #CHROM column somehow affects the process, or either the "#" characters cause some issues, even though I don't think it is the caseLet me know (and sorry for the long message), thanks!