luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
299 stars 37 forks source link

Are there some option to reduce temporary vcf file number? #198

Open xiekunwhy opened 2 years ago

xiekunwhy commented 2 years ago

Hi,

It seems that octopus open one temporary vcf file per contigs/scaffolds. For many non-model species, there are many contigs/scaffolds in their reference genome, for example https://www.ncbi.nlm.nih.gov/assembly/GCA_000966675.2/ , the number of contig/scaffold of this assembly is 4,464,856. And I thank octopus can not use for these species because there are too many files need to open.

Are there some options to reduce temporary vcf file number, or would please add some?

Best, Kun

dancooke commented 2 years ago

Hi, you're correct that Octopus creates a temporary VCF for each contig in the input regions, this is to enable parallel processing of each contig. However, these temporary VCFs are opened dynamically so there should only be one temporary VCF open at any one time. If you're running into problems can you post the error you're seeing?

xiekunwhy commented 2 years ago

Hi,

No Octopus' errors, but file number was up to my hardware system limits and I can not write any thing before removing those temporary VCFs, there are 4000+ individuals need to call. I think you can move a single contig temporary VCF into individual's vcf file and remove it immediately when it was finished .

Best, Kun

xiekunwhy commented 2 years ago

Or can I use Ns to connect contigs/scaffolds to construct longer scaffolds to reduce the temporary vcf files?