pauline-ng / SIFT4G_Create_Genomic_DB

Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
25 stars 7 forks source link

has error during create new database for Culex Tarsalis (invalid chain data) #82

Closed Afei99357 closed 1 year ago

Afei99357 commented 1 year ago

I followed the steps following Making a SIFT database from local genomic and gene annotation file (.gtf). This is the output of i run "perl make-SIFT-db-all.pl -config /sift4g/culex_tarsalis_config.txt" to create the database:

output

root@002ba04e1693:/scripts_to_build_SIFT_db# perl make-SIFT-db-all.pl -config /sift4g/culex_tarsalis_config.txt converting gene format to use-able input done converting gene format making single records file done making single records template making noncoding records file done making noncoding records make the fasta sequences done making the fasta sequences start siftsharp, getting the alignments /sift4g/bin/sift4g -d /landscape_genetics/SIFT/uniref90.fasta.gz -q /sift4g/run_sift/all_prot.fasta --subst /sift4g/run_sift/subst --out /sift4g/run_sift/SIFT_predictions --sub-results Checking query data and substitutions files

Searching database for candidate sequences [ERROR:src/chain.c:69]: invalid chain data ############################################

The config file I have is below

GENE_DOWNLOAD_SITE=ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/gtf//bacteria_11_collection/candidatus_carsonella_ruddii_pv/Candidatus_carsonella_ruddii_pv.ASM1036v1.34.gtf.gz PEP_FILE=ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/pep/Candidatus_carsonella_ruddii_pv.ASM1036v1.34.pep.all.fa.gz CHR_DOWNLOAD_SITE=ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna/

GENETIC_CODE_TABLE=1 GENETIC_CODE_TABLENAME=Standard MITO_GENETIC_CODE_TABLE=1 MITO_GENETIC_CODE_TABLENAME=Standard

PARENT_DIR=/sift4g/run_sift ORG=Culex Tarsalis ORG_VERSION=v1.0.a1

Running SIFT 4G

SIFT4G_PATH=/sift4g/bin/sift4g PROTEIN_DB=/landscape_genetics/SIFT/uniref90.fasta.gz

Sub-directories, don't need to change

LOGFILE=Log.txt ZLOGFILE=Log2.txt GENE_DOWNLOAD_DEST=gene-annotation-src CHR_DOWNLOAD_DEST=chr-src FASTA_DIR=fasta SUBST_DIR=subst SIFT_SCORE_DIR=SIFT_predictions SINGLE_REC_BY_CHR_DIR=singleRecords/ SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores DBSNP_DIR=dbSNP

Doesn't need to change

FASTA_LOG=fasta.log INVALID_LOG=invalid.log PEPTIDE_LOG=peptide.log ENS_PATTERN=ENS SINGLE_RECORD_PATTERN=:change:_aa1valid_dbsnp.singleRecord

###############################

Could you please help me with what is wrong? Thank you very much!!!

pauline-ng commented 1 year ago

Hello,

gzipped files are unfortunately not supported. Try unzipping the uniref90.fasta.gz database and run again.

Thanks, Pauline

Afei99357 commented 1 year ago

Thank you I will try it now!