Open MleGhis opened 1 year ago
Hi @MleGhis!
Thanks for your suggestion, it is normal practice to have a single chromosome per tree sequence, so the API was initially designed to support this. But as the comment you linked suggests there may be reasons for doing this.
It seems write_vcf
could be adapted to take an array, but this doesn't seem ideal as the contig id should come from site metadata ideally. However, it might make sense as a solution for now as we don't have full multi-chrom support.
Would you like to prepare a PR of how this would work for the community to review? There is a development guide in the docs.
Thanks @benjeffery for your prompt reply. Unfortunately, as an intern, I might not have the time to do this right now, and I'll use another (unoptimised) solution for now. But I'll keep that in mind, and I might try later.
There's a few ways we could enable this I think:
contig_id
being some kind of mapping from interval -> string. I guess we should follow the conventions in RateMap to do this. Then we could put in the header in the correct form.
Hi, My SLiM simulations include multiple chromosomes (recombination rates are set at 0.5 between them, as described here ) and I'd like to create a VCF that reflects this structure. Sadly, the write_vcf method currently only accepts strings as input for the "contig_id" parameter, thus forcing every value in the CHROM column of the output VCF to be the same. I can easily create a list of the values I want in this column (since I know the end position of each chromosome), which makes it possible to parse the VCF myself, and I guess that creating a VCF for each chromosome and fusing them afterward would also be an option, but VCF files are rather heavy and many simulations are gonna run, meaning that I'd rather directly give write_vcf the informations needed to create the right file. Do you plan to give the possibility of creating VCF with multiple chromosomes in the future ? Thank you for reading, have a nice day.