Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
25
stars
7
forks
source link
has error during create new database for Culex Tarsalis (invalid chain data) #82
I followed the steps following Making a SIFT database from local genomic and gene annotation file (.gtf).
This is the output of i run "perl make-SIFT-db-all.pl -config /sift4g/culex_tarsalis_config.txt" to create the database:
output
root@002ba04e1693:/scripts_to_build_SIFT_db# perl make-SIFT-db-all.pl -config /sift4g/culex_tarsalis_config.txt
converting gene format to use-able input
done converting gene format
making single records file
done making single records template
making noncoding records file
done making noncoding records
make the fasta sequences
done making the fasta sequences
start siftsharp, getting the alignments
/sift4g/bin/sift4g -d /landscape_genetics/SIFT/uniref90.fasta.gz -q /sift4g/run_sift/all_prot.fasta --subst /sift4g/run_sift/subst --out /sift4g/run_sift/SIFT_predictions --sub-results
Checking query data and substitutions files
processing queries: 100.00/100.00% *
Searching database for candidate sequences
[ERROR:src/chain.c:69]: invalid chain data
############################################
I followed the steps following Making a SIFT database from local genomic and gene annotation file (.gtf). This is the output of i run "perl make-SIFT-db-all.pl -config /sift4g/culex_tarsalis_config.txt" to create the database:
output
root@002ba04e1693:/scripts_to_build_SIFT_db# perl make-SIFT-db-all.pl -config /sift4g/culex_tarsalis_config.txt converting gene format to use-able input done converting gene format making single records file done making single records template making noncoding records file done making noncoding records make the fasta sequences done making the fasta sequences start siftsharp, getting the alignments /sift4g/bin/sift4g -d /landscape_genetics/SIFT/uniref90.fasta.gz -q /sift4g/run_sift/all_prot.fasta --subst /sift4g/run_sift/subst --out /sift4g/run_sift/SIFT_predictions --sub-results Checking query data and substitutions files
Searching database for candidate sequences [ERROR:src/chain.c:69]: invalid chain data ############################################
The config file I have is below
GENE_DOWNLOAD_SITE=ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/gtf//bacteria_11_collection/candidatus_carsonella_ruddii_pv/Candidatus_carsonella_ruddii_pv.ASM1036v1.34.gtf.gz PEP_FILE=ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/pep/Candidatus_carsonella_ruddii_pv.ASM1036v1.34.pep.all.fa.gz CHR_DOWNLOAD_SITE=ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna/
GENETIC_CODE_TABLE=1 GENETIC_CODE_TABLENAME=Standard MITO_GENETIC_CODE_TABLE=1 MITO_GENETIC_CODE_TABLENAME=Standard
PARENT_DIR=/sift4g/run_sift ORG=Culex Tarsalis ORG_VERSION=v1.0.a1
Running SIFT 4G
SIFT4G_PATH=/sift4g/bin/sift4g PROTEIN_DB=/landscape_genetics/SIFT/uniref90.fasta.gz
Sub-directories, don't need to change
LOGFILE=Log.txt ZLOGFILE=Log2.txt GENE_DOWNLOAD_DEST=gene-annotation-src CHR_DOWNLOAD_DEST=chr-src FASTA_DIR=fasta SUBST_DIR=subst SIFT_SCORE_DIR=SIFT_predictions SINGLE_REC_BY_CHR_DIR=singleRecords/ SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores DBSNP_DIR=dbSNP
Doesn't need to change
FASTA_LOG=fasta.log INVALID_LOG=invalid.log PEPTIDE_LOG=peptide.log ENS_PATTERN=ENS SINGLE_RECORD_PATTERN=:change:_aa1valid_dbsnp.singleRecord
###############################
Could you please help me with what is wrong? Thank you very much!!!