odelaneau / shapeit5

Segmented HAPlotype Estimation and Imputation Tool
https://odelaneau.github.io/shapeit5/
MIT License
71 stars 11 forks source link

file extention not recognized #65

Open marlmatos opened 1 year ago

marlmatos commented 1 year ago

Hi, I am trying to use shapeit5 for the first time. I tested the "test" script and it works. I am getting this error for the output file ESC[31mERROR: ESC[0mFilename extension of [/phased_chr/phased_cd4_aging_chr1.vcf.gz] not recognized

my script is the following ` SHAPEIT5=~/packages/shapeit5/phase_common_static

for i in {1..22}; do MAP=~/packages/shapeit5/shapeit5/resources/maps/b38/chr${i}.b38.gmap.gz

INPUT=~/aging_project/scRNAseq/resources/cd4_allsamples_vcf_perchr/cd4_allsamples.chr${i}.tagged.vcf.gz

OUT=/phased_chr/phased_cd4_aging_chr${i}.vcf.gz

$SHAPEIT5 --input $INPUT \ --map $MAP \ --region chr${i} \ --output $OUT \ --thread 16 done `

i tried modifying the script without the variable $OUT and still get the same error.

srubinacci commented 1 year ago

Hi, Difficult to say exactly where the problem is here. But vcf.gz files are read and written by shapeit5. I'd suggest to try with a single chromosome in an interactive shell, it's likely that the options you pass to the program as somewhat wrong.

CNuge commented 1 year ago

Hello,

I wanted to mention that I too am encountering this same error when running SHAPEIT5 for single chromosomes at a time as suggested, from the interactive shell. Initially I thought I was in error, using the .gz extension, and repeated the test specifying the output as .vcf format instead, but this did not solve the problem.

The program appears to be running correctly (does not fail on initiation) and only fails on the final state (follow completion of the entire series of MCMC iterations and finalization.

original command tested: SHAPEIT5_phase_common --region chr21 -I resources/tagged_thousand_genomes/chr21.vcf.gz -O resources/phased_thousand_genomes/phased_chr21.vcf.gz

revised to remove the .gz extension: SHAPEIT5_phase_common --region chr21 -I resources/tagged_thousand_genomes/chr21.vcf.gz -O resources/phased_thousand_genomes/phased_chr21.vcf

run using a conda install of shapeit5 with the following:

channels:
  - bioconda
  - conda-forge
dependencies:
  - shapeit5
patogonzalez commented 11 months ago

Hi all,

I'm experiencing the same. The code is

cd /media/pato/KINGSTON/temp; \ docker run -v $(pwd):/media/pato/KINGSTON/temp -w /media/pato/KINGSTON/temp shapeit5_2023-05-05_d6ce1e2 phase_common_static \ --input jose_merged_imputation_AC.vcf.gz \ --map chr22.b37.gmap.gz \ --region 22 \ --output phased_chr22.vcf.gz \ --log phased_chr22.log

At the end of the terminal output: ############## (many rows)

Finalization:

ERROR: Filename extension of [phased_chr22.vcf.gz] not recognized

##############

I'd tried unsuccessfully using .vcf and no extension.

I would appreciate your help with this issue.

Thanks, Patricio

carsonhh commented 10 months ago

Same issue. The only extension it would accept for me is .bcf

JasonTan-code commented 9 months ago

Same issue here, using shapeit5 installed from conda

lintingyi2014 commented 8 months ago

Same issue, do we have a solution here?

CNuge commented 8 months ago

Only solution I have found is to output to bcf, and then if you require a different format do any necessary conversions afterwards.

bgulko commented 4 months ago

There is a clue in 5.1.1, if you set --output-format vcf.gz phase_common will provide the error message ERROR: Output format[vcf.gz] unsupported, use [graph, bcf or bh] instead This suggests, indeed, that the bcf workaround suggested by @CNuge may be the supported approach to getting compressed vcf files.