odelaneau / GLIMPSE

Low Coverage Calling of Genotypes
MIT License
137 stars 26 forks source link

ERROR: Three files overlapping at position: X - GLIMPSE2 Ligate error #149

Open Npaffen opened 1 year ago

Npaffen commented 1 year ago

[GLIMPSE2] Ligate multiple output files into chromosome-wide files

Files:

Parameters:

Read filenames in [GLIMPSE_ligate/list.chr21.txt]

Ligating chunks

Cnk 0 [chr21:5030578-14572952] [L=128973] Buf 0 [chr21:14572990-15573250] [L_isec=23597 / L_tot=23597] [Avg #hets=588] [Switch rate=1] [Avg phaseQ=27.1098] Cnk 1 [chr21:15573328-18750380] [L=76316] Buf 1 [chr21:18750416-20268812] [L_isec=37008 / L_tot=37008] [Avg #hets=830] [Switch rate=0] [Avg phaseQ=2.45251]

ERROR: Three files overlapping at position: 18750486

I guess this is partly related to this error. I'm running an imputation on low-coverage sample while also having phased WGS data of the parents added to the reference panel. In this case I tried a test run with the trio from the 1KG HG02024 (child) and HG02025 and HG02026 parents. I'm unsure if these information are related to the problem but I thought it might be useful to add them.

How can I ligate the phased and imputed GLIMPSE2 chunks in a meaningful way. Do I need to adjust the chunk bins? I already followed the guideline of the tutorial to achieve this results. When I do not add the parents to the reference panel the whole GLIMPSE2 pipeline runs without an error!

Feel free to ask for any kind of information or clarification. I'm really impressed by the results so far and look forward to hopefully boost them if I can use parental data in the imputation process!

Best regards, Nils

soyeon-2023 commented 1 year ago

Hello, I'm encountering the same error mentioned earlier: "ERROR: Three files overlapping at position: 96027772." using GLIMPSE2 ligate. Is there a possible solution to address this issue?

karinkumar commented 1 year ago

Hello I am encountering the same error. I dealt with it by deleting one of the three files that contained the position that was causing trouble. Obviously not ideal, but at least got me a file with most of my variants

mccafj02 commented 9 months ago

I'm also encountering this error: ERROR: Three files overlapping at position: 125160986 during the ligate step. Has there been any progress on this?

yaacoo commented 9 months ago

Same error here. Any updates on that?

srubinacci commented 9 months ago

Hi, Thanks for reporting this. This is likely due to the chunking: the variants is present in more than two chunks (likely because of the large buffer). You can safely remove the variant from the first of the third file. Please adjust the three chunks in your file so that this won't again happen in the future for subsequent imputation runs.

Will put a check at the chunking level.

Simone

raksasa commented 9 months ago

Same error encounted. I found that the chunk files indeed overlap if follow the tutorials, some of them larger than 1M. I was expecting to have two ways to slove this issue:

  1. In the imputation step, calculate with overlapping region but output non-overlap results. I found there are two parameters '--input-region' and '--output-region', but the description confuse me: --input-region arg Imputation region with buffers --output-region arg Imputation region without buffers Can you confirm that these two parameters can do what I expected?
  2. Another way is to collapse the duplicate variants in the ligate step. But GLIMPSE_ligate dose not provide such option, i think bcftools 'concat' can achive this task, but not sure if there is something special behavious of GLIMPSE_ligate?
yaacoo commented 9 months ago

Same error encounted. I found that the chunk files indeed overlap if follow the tutorials, some of them larger than 1M. I was expecting to have two ways to slove this issue:

  1. In the imputation step, calculate with overlapping region but output non-overlap results. I found there are two parameters '--input-region' and '--output-region', but the description confuse me: --input-region arg Imputation region with buffers --output-region arg Imputation region without buffers Can you confirm that these two parameters can do what I expected?
  2. Another way is to collapse the duplicate variants in the ligate step. But GLIMPSE_ligate dose not provide such option, i think bcftools 'concat' can achive this task, but not sure if there is something special behavious of GLIMPSE_ligate?

Regarding #2: I am not a GLIMPSE author, but to my understanding GLIMPSE ligate takes into account the phasing information when ligating, and bcftools concat doesn't.

raksasa commented 9 months ago

Same error encounted. I found that the chunk files indeed overlap if follow the tutorials, some of them larger than 1M. I was expecting to have two ways to slove this issue:

  1. In the imputation step, calculate with overlapping region but output non-overlap results. I found there are two parameters '--input-region' and '--output-region', but the description confuse me: --input-region arg Imputation region with buffers --output-region arg Imputation region without buffers Can you confirm that these two parameters can do what I expected?
  2. Another way is to collapse the duplicate variants in the ligate step. But GLIMPSE_ligate dose not provide such option, i think bcftools 'concat' can achive this task, but not sure if there is something special behavious of GLIMPSE_ligate?

Regarding #2: I am not a GLIMPSE author, but to my understanding GLIMPSE ligate takes into account the phasing information when ligating, and bcftools concat doesn't.

bcftools concat (v1.16) has an option:

-l, --ligate Ligate phased VCFs by matching phase at overlapping haplotypes

which I guess it does similar

yaacoo commented 9 months ago

Same error encounted. I found that the chunk files indeed overlap if follow the tutorials, some of them larger than 1M. I was expecting to have two ways to slove this issue:

  1. In the imputation step, calculate with overlapping region but output non-overlap results. I found there are two parameters '--input-region' and '--output-region', but the description confuse me: --input-region arg Imputation region with buffers --output-region arg Imputation region without buffers Can you confirm that these two parameters can do what I expected?
  2. Another way is to collapse the duplicate variants in the ligate step. But GLIMPSE_ligate dose not provide such option, i think bcftools 'concat' can achive this task, but not sure if there is something special behavious of GLIMPSE_ligate?

Regarding #2: I am not a GLIMPSE author, but to my understanding GLIMPSE ligate takes into account the phasing information when ligating, and bcftools concat doesn't.

bcftools concat (v1.16) has an option:

-l, --ligate Ligate phased VCFs by matching phase at overlapping haplotypes

which I guess it does similar

Thanks, I was not aware that they added this feature. In that case- I do not know what the advantage of using GLIMPSE ligate is.