Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
I am trying to create a database for a mammalian genome on RefSeq. The run time is quite long (several days), and occasionally the procedure fails due to various errors (out of memory, for instance). I am wondering if it would be okay to parallelize the database creation by running it independently for each chromosome/contig. Then, after all jobs have completed, I would keep the .gz, .regions, and *_SIFTDB_stats.txt from each /. Do you think this would be okay?
Hi Pauline,
I am trying to create a database for a mammalian genome on RefSeq. The run time is quite long (several days), and occasionally the procedure fails due to various errors (out of memory, for instance). I am wondering if it would be okay to parallelize the database creation by running it independently for each chromosome/contig. Then, after all jobs have completed, I would keep the .gz, .regions, and *_SIFTDB_stats.txt from each/. Do you think this would be okay?
Thanks, Jacqueline