odelaneau / shapeit5

Segmented HAPlotype Estimation and Imputation Tool
https://odelaneau.github.io/shapeit5/
MIT License
56 stars 9 forks source link

phase_rare on small locus #55

Closed klmartinez closed 10 months ago

klmartinez commented 11 months ago

Hi,

I am trying to phase a 1Mb locus (I only need the phased genotypes for a handful of SNPs found in the middle of this locus). However, I see that phase_rare relies on reading in inputs from chunks.coordinates.txt. When using GLIMPSE2_chunk to create chunks.coordinates.txt, I receive text describing that the region is too small, and the resulting chunks.coordinates.txt is empty.

Is there a workaround for this?

Here is the text:

[GLIMPSE2] Split chromosomes into chunks
  * Authors              : Simone RUBINACCI & Olivier DELANEAU, University of Lausanne
  * Contact              : simone.rubinacci@unil.ch & olivier.delaneau@unil.ch
  * Version          : GLIMPSE2_chunk v2.0.0 / commit = 88915f7 / release = 2023-04-24
  * Citation             : BiorXiv, (2022). DOI: https://doi.org/10.1101/2022.11.28.518213
  *                      : Nature Genetics 53, 120–126 (2021). DOI: https://doi.org/10.1038/s41588-020-00756-0
  * Run date         : 07/08/2023 - 22:39:19

Files:
  * Input VCF            : [mt_chr9_serology_subset_locus_srwgs_v7.bcf]
  * Region               : [chr9]
  * Genetic Map          : [genetic_map_hg38_chr9.txt]
  * Output file          : [chunks_coordinates.txt]

Parameters:
  * Sparse MAF           : [0.001]
  * Algorithm            : [Sequential]
  * Recombination rates  : [Given by genetic map]
  * Min. Window size     : [2.5cM | 2Mb | 20000 variants]
  * Min. Buffer size     : [0.5cM | 0.4Mb | 2000 variants]

Other parameters
  * Seed                 : [15052011]
  * #Threads             : [1]

Reading input files
  * Input file read      : [mt_chr9_serology_subset_locus_srwgs_v7.bcf] (9.37s)
  * #variants            : [43554 total | 25623 rare | 17931 common]
  * Genetic map          : [n=152377] (0.08s)
  * Map cM long interpolation : [s=1409 / i=42145] (0.00s)
  * Region spans 1019930 bp and 4.09 cM

Splitting data long into chunks and writing to [chunks_coordinates.txt]
  * Region appears to small to find a sequential solution (only one chunk detected). Check your parameters if this is not a desired behavior.
  * Terminal window [0] -buffer:[chr9:132755284-133775213] / +buffer:[chr9:132755284-133775213] / L=3.09088cM / L=1019929bp / C=43554
  * #chunks = 1

Total running time = 9 seconds
CPU times: user 4.48 ms, sys: 10.5 ms, total: 15 ms
Wall time: 9.47 s
srubinacci commented 11 months ago

Hi,

If I understand it well, you just want to phase a single region, therefore there is not need to run GLIMPSE2_chunk. The reason why we run GLIMPSE2_chunk is to have uniform chunks across a chromosome (with overlapping buffers) and then ligate the phased data to produce phased data chromosome-wide.

However, as you are just trying to phase a 1Mb region, you can simply provide it to phase_common using the --region parameter. As it's such a small region, (43554 variants), likely you don't even need to run phase_rare (so you might want to use --filter-maf =0, allowing the program to consider all variants).

In the case you need to run phase_rare for computational reasons, you just need to specify the same region using the --input-region option.

Hope this helps

Simone

klmartinez commented 11 months ago

Hi Simone,

I had tried skipping the GLIMPSE2_chunk step and did phase_rare on its own, but I get prompted to add in the --scaffold-region flag and associated input.

I ran ./phase_rare_static --input mt_chr9_serology_subset_locus_srwgs_v7.bcf --scaffold target.scaffold.bcf --map genetic_map_hg38_chr9.txt --input-region chr9 --output target.phased.chunk.bcf --thread 16

And got the following error:

[SHAPEIT5] Phase_rare (phase rare variants onto a haplotype scaffold)
  * Authors       : Simone RUBINACCI & Olivier DELANEAU, University of Lausanne
  * Contact       : simone.rubinacci@unil.ch & olivier.delaneau@gmail.com
  * Version       : 5.1.1 / commit = 990ed0d / release = 2023-05-08
  * Run date      : 08/08/2023 - 22:47:38

ERROR: --scaffold missing
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<timed eval> in <module>

/opt/conda/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2460             with self.builtin_trap:
   2461                 args = (magic_arg_s, cell)
-> 2462                 result = fn(*args, **kwargs)
   2463             return result
   2464 

/opt/conda/lib/python3.7/site-packages/IPython/core/magics/script.py in named_script_magic(line, cell)
    140             else:
    141                 line = script
--> 142             return self.shebang(line, cell)
    143 
    144         # write a basic docstring:

<decorator-gen-103> in shebang(self, line, cell)

/opt/conda/lib/python3.7/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

/opt/conda/lib/python3.7/site-packages/IPython/core/magics/script.py in shebang(self, line, cell)
    243             sys.stderr.flush()
    244         if args.raise_error and p.returncode!=0:
--> 245             raise CalledProcessError(p.returncode, cell, output=out, stderr=err)
    246 
    247     def _run_script(self, p, cell, to_close):

CalledProcessError: Command 'b'./phase_rare_static --input mt_chr9_serology_subset_locus_srwgs_v7.bcf --map genetic_map_hg38_chr9.txt --input-region chr9 --output target.phased.chunk.bcf  --thread 16\n'' returned non-zero exit status 1.
srubinacci commented 11 months ago

Hi,

Indeed, phase_rare always requires a scaffold derived from phase_common. Can you please try to run phase_common instead with the --filter-maf =0 option? If you have a scaffold (e.g. derived from SNP array data) and want to use it, you can add the --scaffold option in phase_common too. It is unclear to me if this is your case.

Hopefully this works, but it might be slow. In that case, you might need to run phase_rare as well. But let's focus on getting things working first.

Simone

klmartinez commented 10 months ago

Things worked with phase_common using the --filter-maf =0 option.

srubinacci commented 10 months ago

Great. Closing for now.