pauline-ng / SIFT4G_Create_Genomic_DB

Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
21 stars 7 forks source link

Creating SIFT database from local genomic and gtf file with test data does not work #84

Closed gubrins closed 9 months ago

gubrins commented 9 months ago

Hi, I am trying to run the homo sapiens test data but I am not managing to create the database.

I used this code: perl make-SIFT-db-all.pl -config test_files/homo_sapiens-test.txt

This is the output:

converting gene format to use-able input
Can't locate Switch.pm in @INC (you may need to install the Switch module) (@INC contains: /home/goliath/miniconda3/lib/site_perl/5.34.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/site_perl/5.34.0 /home/goliath/miniconda3/lib/5.34.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/5.34.0 .) at gff_gene_format_to_ucsc.pl line 4.
BEGIN failed--compilation aborted at gff_gene_format_to_ucsc.pl line 4.
done converting gene format
making single records file
Can't locate DBI.pm in @INC (you may need to install the DBI module) (@INC contains: /home/goliath/miniconda3/lib/site_perl/5.34.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/site_perl/5.34.0 /home/goliath/miniconda3/lib/5.34.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/5.34.0 .) at make-single-records-BIOPERL.pl line 22.
BEGIN failed--compilation aborted at make-single-records-BIOPERL.pl line 22.
done making single records template
making noncoding records file
Can't locate DBI.pm in @INC (you may need to install the DBI module) (@INC contains: /home/goliath/miniconda3/lib/site_perl/5.34.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/site_perl/5.34.0 /home/goliath/miniconda3/lib/5.34.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/5.34.0 .) at make-single-records-noncoding.pl line 12.
BEGIN failed--compilation aborted at make-single-records-noncoding.pl line 12.
done making noncoding records
make the fasta sequences
Can't locate DBI.pm in @INC (you may need to install the DBI module) (@INC contains: /home/goliath/miniconda3/lib/site_perl/5.34.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/site_perl/5.34.0 /home/goliath/miniconda3/lib/5.34.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/5.34.0 .) at generate-fasta-subst-files-BIOPERL.pl line 23.
BEGIN failed--compilation aborted at generate-fasta-subst-files-BIOPERL.pl line 23.
done making the fasta sequences
start siftsharp, getting the alignments
cat: './test_files/homo_sapiens_small/fasta/*.fasta': No such file or directory
/home/goliath/software/sift4g/bin/sift4g -d home/goliath/software/funannotate_florida/funannotate_db/uniprot_sprot.fasta -q ./test_files/homo_sapiens_small/all_prot.fasta --subst ./test_files/homo_sapiens_small/subst --out ./test_files/homo_sapiens_small/SIFT_predictions --sub-results 
[ERROR]: invalid database file path 'home/goliath/software/funannotate_florida/funannotate_db/uniprot_sprot.fasta'

And this is my homo-sapiens-test.txt file:

GENETIC_CODE_TABLE=1
GENETIC_CODE_TABLENAME=Standard
MITO_GENETIC_CODE_TABLE=2
MITO_GENETIC_CODE_TABLENAME=Vertebrate Mitochondrial

PARENT_DIR=./test_files/homo_sapiens_small
ORG=homo_sapiens
ORG_VERSION=GRCh38.83
DBSNP_VCF_FILE=Homo_sapiens.vcf.gz

#Running SIFT 4G
SIFT4G_PATH=/home/goliath/software/sift4g/bin/sift4g
PROTEIN_DB=home/goliath/software/funannotate_florida/funannotate_db/uniprot_sprot.fasta

# Sub-directories, don't need to change
GENE_DOWNLOAD_DEST=gene-annotation-src
CHR_DOWNLOAD_DEST=chr-src
LOGFILE=Log.txt
ZLOGFILE=Log2.txt
FASTA_DIR=fasta
SUBST_DIR=subst
ALIGN_DIR=SIFT_alignments
SIFT_SCORE_DIR=SIFT_predictions
SINGLE_REC_BY_CHR_DIR=singleRecords
SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores
DBSNP_DIR=dbSNP

# Doesn't need to change
FASTA_LOG=fasta.log
INVALID_LOG=invalid.log
PEPTIDE_LOG=peptide.log
ENS_PATTERN=ENS
SINGLE_RECORD_PATTERN=:change:_aa1valid_dbsnp.singleRecord

I am using a uniport_sprot.fasta file from another package (funannotate) but I think it is the same dataset.

Any help would be appreaciated!

gubrins commented 9 months ago

followed #51 and solved the Can't locate Switch.pm in @INC error I still have the other errors though. DBI is already installed with sudo apt-get install libdbi-perl

pauline-ng commented 9 months ago

Please change your PARENT_DIR to a full path, not a relative path

For example PARENT_DIR=/home/goliath/software/sift4g/test_files/homo_sapiens_small

gubrins commented 9 months ago

Yes sorry for that, I already modified it but the error keeps the same, I think there is some type of problem with perl versions? I have one installed through miniconda (v.5.22.0) and another at /usr/bin/perl (v5.34.0)

This is the actual error:

converting gene format to use-able input
done converting gene format
making single records file
Can't locate Bio/DB/Fasta.pm in @INC (you may need to install the Bio::DB::Fasta module) (@INC contains: /usr/share/perl5 /usr/share/perl5 /home/goliath/miniconda3/lib/perl5/site_perl/5.22.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/perl5/site_perl/5.22.0 /home/goliath/miniconda3/lib/perl5/5.22.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/perl5/5.22.0 .) at make-single-records-BIOPERL.pl line 24.
BEGIN failed--compilation aborted at make-single-records-BIOPERL.pl line 24.
done making single records template
making noncoding records file
Can't locate Bio/DB/Fasta.pm in @INC (you may need to install the Bio::DB::Fasta module) (@INC contains: /usr/share/perl5 /usr/share/perl5 /home/goliath/miniconda3/lib/perl5/site_perl/5.22.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/perl5/site_perl/5.22.0 /home/goliath/miniconda3/lib/perl5/5.22.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/perl5/5.22.0 .) at make-single-records-noncoding.pl line 14.
BEGIN failed--compilation aborted at make-single-records-noncoding.pl line 14.
done making noncoding records
make the fasta sequences
Can't locate Bio/DB/Fasta.pm in @INC (you may need to install the Bio::DB::Fasta module) (@INC contains: /usr/share/perl5 /usr/share/perl5 /home/goliath/miniconda3/lib/perl5/site_perl/5.22.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/perl5/site_perl/5.22.0 /home/goliath/miniconda3/lib/perl5/5.22.0/x86_64-linux-thread-multi /home/goliath/miniconda3/lib/perl5/5.22.0 .) at generate-fasta-subst-files-BIOPERL.pl line 25.
BEGIN failed--compilation aborted at generate-fasta-subst-files-BIOPERL.pl line 25.
done making the fasta sequences
start siftsharp, getting the alignments
cat: '/home/goliath/software/sift4g/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/fasta/*.fasta': No such file or directory
/home/goliath/software/sift4g/bin/sift4g -d /home/goliath/software/funannotate_florida/funannotate_db/uniprot_sprot.fasta -q /home/goliath/software/sift4g/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/all_prot.fasta --subst /home/goliath/software/sift4g/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/subst --out /home/goliath/software/sift4g/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/SIFT_predictions --sub-results 
** Checking query data and substitutions files **

** EXITING! No valid queries to process. **

Thanks for the help!

gubrins commented 9 months ago

I am not sure why it does not find it at the base environment. I am running it in another environment where I also had those modules installed and it seems it is working, I will keep you updated!

gubrins commented 9 months ago

it worked!

converting gene format to use-able input
done converting gene format
making single records file
done making single records template
making noncoding records file
done making noncoding records
make the fasta sequences
done making the fasta sequences
start siftsharp, getting the alignments
/home/goliath/software/sift4g/bin/sift4g -d /home/goliath/software/funannotate_florida/funannotate_db/uniprot_sprot.fasta -q /home/goliath/software/sift4g/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/all_prot.fasta --subst /home/goliath/software/sift4g/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/subst --out /home/goliath/software/sift4g/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/SIFT_predictions --sub-results 
** Checking query data and substitutions files **
* processing queries: 100.00/100.00% *

** Searching database for candidate sequences **
* processing database part 2 (size ~0.25 GB): 100.00/100.00% *

** Aligning queries with candidate sequences **
* processing database part 1 (size ~1.00 GB): 100.00/100.00% *

** Selecting alignments with median threshold: 2.75 **
* processing queries: 100.00/100.00% *

** Generating SIFT predictions with sequence identity: 100.00% **
* processing queries: 100.00/100.00% *

done getting all the scores
populating databases
checking the databases
zipping up /home/goliath/software/sift4g/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/chr-src/*
All done!