popgenmethods / smcpp

SMC++ infers population history from whole-genome sequence data.
GNU General Public License v3.0
149 stars 34 forks source link

vcf2smc with csi indexed mask file? #256

Open mc-er opened 9 months ago

mc-er commented 9 months ago

Hello, I'm trying to use vcf2smc to produce the input files using a mask file. Since I work on a system with a large genome size (a chromosome can be over 1.7Gb) therefore .tbi index does not work and I have produced a .csi index of my mask file. When I run vcf2smc I get the following error.

smc++ vcf2smc --cores 4 --mask uncalled_regions.bed.gz -d P8109-108 P9904-107 chr01.vcf.gz out.smc.gz chr01 pop1:P8109-101,P8109-102,P8109-103,P8109-104,P8109-105,P8109-107,P8109-108,P9904-107,P9904-117,P17553-593,P21002-108
2648 smcpp.commands.vcf2smc INFO Population 1:
2648 smcpp.commands.vcf2smc INFO Distinguished lineages: P8109-108:0, P9904-107:1
2648 smcpp.commands.vcf2smc INFO Undistinguished lineages: P8109-101:0, P8109-101:1, P8109-102:0, P8109-102:1, P8109-103:0, P8109-103:1, P8109-104:0, P8109-104:1, P8109-105:0, P8109-105:1, P8109-107:0, P8109-107:1, P8109-108:1, P9904-107:0, P9904-117:0, P9904-117:1, P17553-593:0, P17553-593:1, P21002-108:0, P21002-108:1
Traceback (most recent call last):
  File "/sw/bioinfo/SMC++/1.15.5.dev12+g8bdecdf/rackham/bin/smc++", line 8, in <module>
    sys.exit(main())
  File "/sw/bioinfo/SMC++/1.15.5.dev12+g8bdecdf/rackham/venv/lib/python3.8/site-packages/smcpp/frontend/console.py", line 28, in main
    cmds[args.command].main(args)
  File "/sw/bioinfo/SMC++/1.15.5.dev12+g8bdecdf/rackham/venv/lib/python3.8/site-packages/smcpp/commands/vcf2smc.py", line 197, in main
    mask_iterator = TabixFile(
  File "pysam/libctabix.pyx", line 349, in pysam.libctabix.TabixFile.__cinit__
  File "pysam/libctabix.pyx", line 381, in pysam.libctabix.TabixFile._open
OSError: index `uncalled_regions.bed.gz.tbi` not found

It seems like vcf2smc does not work with .csi index. Is this a correct assessment from my side?