Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
25
stars
7
forks
source link
Making database for sugarcane ends without warning or error #101
Hi Pauline,
When I used SIFT4G_Create_Genomic_DB to create the sugarcane database, I encountered a problem. While creating databases for each chromosome in sugarcane, some chromosomes successfully created databases, but others did not. There were no warning or error messages for the failures. I don't know why this is happening. Here are the outputs when the process succeeded and failed:
the succeeded output message:
Checking query data and substitutions files
processing queries: 100.00/100.00% *
Searching database for candidate sequences
processing database part 364 (size ~0.25 GB): 100.00/100.00% *
Aligning queries with candidate sequences
processing database part 91 (size ~1.00 GB): 100.00/100.00% *
Selecting alignments with median threshold: 2.75
processing queries: 100.00/100.00% *
Generating SIFT predictions with sequence identity: 100.00%
processing queries: 100.00/100.00% *
the failed output message:
Checking query data and substitutions files
processing queries: 100.00/100.00% *
Searching database for candidate sequences
processing database part 364 (size ~0.25 GB): 100.00/100.00% *
Uniref90 was utilized for the database creation.
The config file:
Hi Pauline, When I used SIFT4G_Create_Genomic_DB to create the sugarcane database, I encountered a problem. While creating databases for each chromosome in sugarcane, some chromosomes successfully created databases, but others did not. There were no warning or error messages for the failures. I don't know why this is happening. Here are the outputs when the process succeeded and failed: the succeeded output message:
Checking query data and substitutions files
Searching database for candidate sequences
Aligning queries with candidate sequences
Selecting alignments with median threshold: 2.75
Generating SIFT predictions with sequence identity: 100.00%
the failed output message:
Checking query data and substitutions files
Searching database for candidate sequences
Uniref90 was utilized for the database creation. The config file:
GENETIC_CODE_TABLE=1 GENETIC_CODE_TABLENAME=Standard MITO_GENETIC_CODE_TABLE=11 MITO_GENETIC_CODE_TABLENAME=Plant Plastid Code
PARENT_DIR=/xtdisk/apod/xiehx/Deleterious_variants/SIFT/Saccharum/SIFT_Database/Chr8D ORG=Saccharum_spontaneum ORG_VERSION=Np-X
Running SIFT 4G
SIFT4G_PATH=/gpfs/biosoft/app2/python2024/envs/sift4g/bin/sift4g PROTEIN_DB=/xtdisk/apod/xiehx/Deleterious_variants/SIFT/Saccharum/SIFT_Database/config/uniref90.fasta
Sub-directories, don't need to change
GENE_DOWNLOAD_DEST=gene-annotation-src CHR_DOWNLOAD_DEST=chr-src LOGFILE=Log.txt ZLOGFILE=Log2.txt FASTA_DIR=fasta SUBST_DIR=subst ALIGN_DIR=SIFT_alignments SIFT_SCORE_DIR=SIFT_predictions SINGLE_REC_BY_CHR_DIR=singleRecords SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores DBSNP_DIR=dbSNP
Doesn't need to change
FASTA_LOG=fasta.log INVALID_LOG=invalid.log PEPTIDE_LOG=peptide.log ENS_PATTERN=ENS SINGLE_RECORD_PATTERN=:change:_aa1valid_dbsnp.singleRecord
Could you give me some suggestions and help? Thank you very much for your advice and time!