pauline-ng / SIFT4G_Create_Genomic_DB

Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
22 stars 7 forks source link

Run each chromosome/contig independently #17

Closed jarobin closed 4 years ago

jarobin commented 4 years ago

Hi Pauline,

I am trying to create a database for a mammalian genome on RefSeq. The run time is quite long (several days), and occasionally the procedure fails due to various errors (out of memory, for instance). I am wondering if it would be okay to parallelize the database creation by running it independently for each chromosome/contig. Then, after all jobs have completed, I would keep the .gz, .regions, and *_SIFTDB_stats.txt from each /. Do you think this would be okay?

Thanks, Jacqueline

pauline-ng commented 4 years ago

Hi Jacqueline,

I don't think we ever tried parallelizing it, but sure? Several days to run a mammalian genome sounds about right.

Pauline