Closed abcdefghijklmn97 closed 1 year ago
Use full paths, not relative paths. (The config file DIR variables should not have ".")
Use full paths, not relative paths. (The config file DIR variables should not have ".")
When my config file is changed to PARENT_DIR=/nfs/LJH/TEST/sift/JYM/SIFT4G_Create_Genomic_DB-master/test_files/homo_sapiens_small is still the same, there is no result file.
What happens when you run
sift4g -d /nfs/LJH/zhushi/plants/nr.plant.fa -q ./test_files/homo_sapiens_small/all_prot.fasta --subst ./test_files/homo_sapiens_small/subst --out ./test_files/homo_sapiens_small/SIFT_predictions --sub-results
(using full paths and not relative paths).Are there any result files in /test_files/homo_sapiens_small/SIFT_predictions ?
What happens when you run
sift4g -d /nfs/LJH/zhushi/plants/nr.plant.fa -q ./test_files/homo_sapiens_small/all_prot.fasta --subst ./test_files/homo_sapiens_small/subst --out ./test_files/homo_sapiens_small/SIFT_predictions --sub-results
(using full paths and not relative paths).Are there any result files in /test_files/homo_sapiens_small/SIFT_predictions ?
(sift4g) [root@localhost SIFT4G_Create_Genomic_DB-master]# sift4g -d /nfs/SOFT/nr -q /nfs/LJH/TEST/sift/JYM/SIFT4G_Create_Genomic_DB-master/test_files/homo_sapiens_small/all_prot.fasta --subst /nfs/LJH/TEST/sift/JYM/SIFT4G_Create_Genomic_DB-master/test_files/homo_sapiens_small/subst --out /nfs/LJH/TEST/sift/JYM/SIFT4G_Create_Genomic_DB-master/test_files/homo_sapiens_small/SIFT_predictions --sub-results Checking query data and substitutions files
Searching database for candidate sequences
and no file in /nfs/LJH/TEST/sift/JYM/SIFT4G_Create_Genomic_DB-master/test_files/homo_sapiens_small/SIFT_predictions
What happens when you run
sift4g -d /nfs/LJH/zhushi/plants/nr.plant.fa -q ./test_files/homo_sapiens_small/all_prot.fasta --subst ./test_files/homo_sapiens_small/subst --out ./test_files/homo_sapiens_small/SIFT_predictions --sub-results
(using full paths and not relative paths).Are there any result files in /test_files/homo_sapiens_small/SIFT_predictions ?
My own data is plant genome, so I used the plant's protein database, the operation ended without any result as well.
The protein database I use when using Partial Homo sapiens example is plant-based. I don't know if it has any effect?
Also, does the nuclear genome related to this no result ?
Use UniRef90 fasta
https://www.uniprot.org/help/downloads
The SIFT4G algorithm will find the homologous sequences. If your plant protein database is too small and has no homologues, it won't work.
Use UniRef90 fasta
https://www.uniprot.org/help/downloads
The SIFT4G algorithm will find the homologous sequences. If your plant protein database is too small and has no homologues, it won't work.
When I downloaded UniRef90 fasta and used it as the protein database, the operation ended successfully but without any result file, just like before.
Hi,
I got the same issue as you with the test data set and I may have found the solution.
I saw this reply from another issue opened here: https://github.com/pauline-ng/SIFT4G_Create_Genomic_DB/issues/81#issuecomment-1595348807
I initially used sift4g that was already installed on my cluster. So I checked my gcc version and it was v4.8.5 by default. I changed it to the newest version (v11.2.0 for me). I don't remember if it was better or the same, but there was still some issues. So after having loaded gcc v11.2.0, I tried to install and compile sift4g directly from github (here: https://github.com/rvaser/sift4g). I ran the test data set again with this newly compiled sift4g and I now get the chromosome files in the database folder. I also realized the output message showed on the screen while running the perl script was actually not complete before! This is the full output message I get now on the screen:
[shuynh@core-login1 scripts_to_build_SIFT_db]$ perl make-SIFT-db-all.pl -config test_files/homo_sapiens-test.txt
converting gene format to use-able input
done converting gene format
making single records file
done making single records template
making noncoding records file
done making noncoding records
make the fasta sequences
done making the fasta sequences
start siftsharp, getting the alignments
/shared/projects/domisol/scripts/sift4g/bin/sift4g -d /shared/projects/domisol/scripts/SIFT/uniprot_sprot.fasta -q /shared/projects/domisol/scripts/SIFT/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/all_prot.fasta --subst /shared/projects/domisol/scripts/SIFT/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/subst --out /shared/projects/domisol/scripts/SIFT/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/SIFT_predictions --sub-results
** Checking query data and substitutions files **
* processing queries: 100.00/100.00% *
** Searching database for candidate sequences **
* processing database part 2 (size ~0.25 GB): 100.00/100.00% *
** Aligning queries with candidate sequences **
* processing database part 1 (size ~1.00 GB): 100.00/100.00% *
** Selecting alignments with median threshold: 2.75 **
* processing queries: 100.00/100.00% *
** Generating SIFT predictions with sequence identity: 100.00% **
* processing queries: 100.00/100.00% *
done getting all the scores
populating databases
checking the databases
zipping up /shared/projects/domisol/scripts/SIFT/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/chr-src/*
All done!
[shuynh@core-login1 scripts_to_build_SIFT_db]$ ll test_files/homo_sapiens_small/GRCh38.83/
total 225748
-rw-rw----+ 1 shuynh shuynh 230776344 14 sept. 02:18 21.gz
-rw-rw----+ 1 shuynh shuynh 117688 14 sept. 02:16 21.regions
-rw-rw----+ 1 shuynh shuynh 507 14 sept. 02:21 21_SIFTDB_stats.txt
-rw-rw----+ 1 shuynh shuynh 240 14 sept. 02:19 CHECK_GENES.LOG
-rw-rw----+ 1 shuynh shuynh 1049 14 sept. 02:22 homo_sapiens-test.txt
-rw-rw----+ 1 shuynh shuynh 230219 14 sept. 02:18 MT.gz
-rw-rw----+ 1 shuynh shuynh 444 14 sept. 02:18 MT.regions
-rw-rw----+ 1 shuynh shuynh 480 14 sept. 02:21 MT_SIFTDB_stats.txt
Although sift4g has been compiled with gcc v11.2.0, I still have to load again gcc v11.2.0 whenever I open a new session, otherwise sift4g shows some error messages. I have started running it on my real dataset today. This takes some time but it seems to be running fine so far.
I hope this will help.
Stella
Hi,
I got the same issue as you with the test data set and I may have found the solution.
I saw this reply from another issue opened here: #81 (comment)
I initially used sift4g that was already installed on my cluster. So I checked my gcc version and it was v4.8.5 by default. I changed it to the newest version (v11.2.0 for me). I don't remember if it was better or the same, but there was still some issues. So after having loaded gcc v11.2.0, I tried to install and compile sift4g directly from github (here: https://github.com/rvaser/sift4g). I ran the test data set again with this newly compiled sift4g and I now get the chromosome files in the database folder. I also realized the output message showed on the screen while running the perl script was actually not complete before! This is the full output message I get now on the screen:
[shuynh@core-login1 scripts_to_build_SIFT_db]$ perl make-SIFT-db-all.pl -config test_files/homo_sapiens-test.txt converting gene format to use-able input done converting gene format making single records file done making single records template making noncoding records file done making noncoding records make the fasta sequences done making the fasta sequences start siftsharp, getting the alignments /shared/projects/domisol/scripts/sift4g/bin/sift4g -d /shared/projects/domisol/scripts/SIFT/uniprot_sprot.fasta -q /shared/projects/domisol/scripts/SIFT/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/all_prot.fasta --subst /shared/projects/domisol/scripts/SIFT/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/subst --out /shared/projects/domisol/scripts/SIFT/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/SIFT_predictions --sub-results ** Checking query data and substitutions files ** * processing queries: 100.00/100.00% * ** Searching database for candidate sequences ** * processing database part 2 (size ~0.25 GB): 100.00/100.00% * ** Aligning queries with candidate sequences ** * processing database part 1 (size ~1.00 GB): 100.00/100.00% * ** Selecting alignments with median threshold: 2.75 ** * processing queries: 100.00/100.00% * ** Generating SIFT predictions with sequence identity: 100.00% ** * processing queries: 100.00/100.00% * done getting all the scores populating databases checking the databases zipping up /shared/projects/domisol/scripts/SIFT/scripts_to_build_SIFT_db/test_files/homo_sapiens_small/chr-src/* All done! [shuynh@core-login1 scripts_to_build_SIFT_db]$ ll test_files/homo_sapiens_small/GRCh38.83/ total 225748 -rw-rw----+ 1 shuynh shuynh 230776344 14 sept. 02:18 21.gz -rw-rw----+ 1 shuynh shuynh 117688 14 sept. 02:16 21.regions -rw-rw----+ 1 shuynh shuynh 507 14 sept. 02:21 21_SIFTDB_stats.txt -rw-rw----+ 1 shuynh shuynh 240 14 sept. 02:19 CHECK_GENES.LOG -rw-rw----+ 1 shuynh shuynh 1049 14 sept. 02:22 homo_sapiens-test.txt -rw-rw----+ 1 shuynh shuynh 230219 14 sept. 02:18 MT.gz -rw-rw----+ 1 shuynh shuynh 444 14 sept. 02:18 MT.regions -rw-rw----+ 1 shuynh shuynh 480 14 sept. 02:21 MT_SIFTDB_stats.txt
Although sift4g has been compiled with gcc v11.2.0, I still have to load again gcc v11.2.0 whenever I open a new session, otherwise sift4g shows some error messages. I have started running it on my real dataset today. This takes some time but it seems to be running fine so far.
I hope this will help.
Stella
Thank you, it worked fine after I changed the gcc version.
Best wishes!
Thanks @stella-huynh . Appreciate it!
I'm having this issue as well. I've updated my gcc (gcc (GCC) 10.3.0), reinstalled sift4g, and have made sure that I have loaded the updated gcc version before running, but it still does not create a results file.
Here is the command I'm running:
perl make-SIFT-db-all.pl -config test_files/homo_sapiens-test.txt
The homo_sapiens-test.txt file:
GENETIC_CODE_TABLE=1
GENETIC_CODE_TABLENAME=Standard
MITO_GENETIC_CODE_TABLE=2
MITO_GENETIC_CODE_TABLENAME=Vertebrate Mitochondrial
PARENT_DIR=./test_files/homo_sapiens_small
ORG=homo_sapiens
ORG_VERSION=GRCh38.83
DBSNP_VCF_FILE=Homo_sapiens.vcf.gz
#Running SIFT 4G
SIFT4G_PATH=/oak/stanford/groups/dpetrov/ksolari/SIFT/sift4g/bin/sift4g
PROTEIN_DB=/oak/stanford/groups/dpetrov/ksolari/SIFT/uniref90.fasta
# Sub-directories, don't need to change
GENE_DOWNLOAD_DEST=gene-annotation-src
CHR_DOWNLOAD_DEST=chr-src
LOGFILE=Log.txt
ZLOGFILE=Log2.txt
FASTA_DIR=fasta
SUBST_DIR=subst
ALIGN_DIR=SIFT_alignments
SIFT_SCORE_DIR=SIFT_predictions
SINGLE_REC_BY_CHR_DIR=singleRecords
SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores
DBSNP_DIR=dbSNP
# Doesn't need to change
FASTA_LOG=fasta.log
INVALID_LOG=invalid.log
PEPTIDE_LOG=peptide.log
ENS_PATTERN=ENS
SINGLE_RECORD_PATTERN=:change:_aa1valid_dbsnp.singleRecord
The screen output:
converting gene format to use-able input
done converting gene format
making single records file
done making single records template
making noncoding records file
done making noncoding records
make the fasta sequences
done making the fasta sequences
start siftsharp, getting the alignments
/oak/stanford/groups/dpetrov/ksolari/SIFT/sift4g/bin/sift4g -d /oak/stanford/groups/dpetrov/ksolari/SIFT/uniref90.fasta -q ./test_files/homo_sapiens_small/all_prot.fasta --subst ./test_files/homo_sapiens_small/subst --out ./test_files/homo_sapiens_small/SIFT_predictions --sub-results
** Checking query data and substitutions files **
* processing queries: 100.00/100.00% *
** Searching database for candidate sequences **
Any suggestions that anyone can offer will be much appreciated! Thank you!!!
Katie
@ksolari It would be better if you opened up a new issue. After opening up a new issue, please list your directory contents and file sizes.
Thank you! Will do!
I'm having this issue as well. I've updated my gcc (gcc (GCC) 10.3.0), reinstalled sift4g, and have made sure that I have loaded the updated gcc version before running, but it still does not create a results file.
Here is the command I'm running:
perl make-SIFT-db-all.pl -config test_files/homo_sapiens-test.txt
The homo_sapiens-test.txt file:
GENETIC_CODE_TABLE=1 GENETIC_CODE_TABLENAME=Standard MITO_GENETIC_CODE_TABLE=2 MITO_GENETIC_CODE_TABLENAME=Vertebrate Mitochondrial PARENT_DIR=./test_files/homo_sapiens_small ORG=homo_sapiens ORG_VERSION=GRCh38.83 DBSNP_VCF_FILE=Homo_sapiens.vcf.gz #Running SIFT 4G SIFT4G_PATH=/oak/stanford/groups/dpetrov/ksolari/SIFT/sift4g/bin/sift4g PROTEIN_DB=/oak/stanford/groups/dpetrov/ksolari/SIFT/uniref90.fasta # Sub-directories, don't need to change GENE_DOWNLOAD_DEST=gene-annotation-src CHR_DOWNLOAD_DEST=chr-src LOGFILE=Log.txt ZLOGFILE=Log2.txt FASTA_DIR=fasta SUBST_DIR=subst ALIGN_DIR=SIFT_alignments SIFT_SCORE_DIR=SIFT_predictions SINGLE_REC_BY_CHR_DIR=singleRecords SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores DBSNP_DIR=dbSNP # Doesn't need to change FASTA_LOG=fasta.log INVALID_LOG=invalid.log PEPTIDE_LOG=peptide.log ENS_PATTERN=ENS SINGLE_RECORD_PATTERN=:change:_aa1valid_dbsnp.singleRecord
The screen output:
converting gene format to use-able input done converting gene format making single records file done making single records template making noncoding records file done making noncoding records make the fasta sequences done making the fasta sequences start siftsharp, getting the alignments /oak/stanford/groups/dpetrov/ksolari/SIFT/sift4g/bin/sift4g -d /oak/stanford/groups/dpetrov/ksolari/SIFT/uniref90.fasta -q ./test_files/homo_sapiens_small/all_prot.fasta --subst ./test_files/homo_sapiens_small/subst --out ./test_files/homo_sapiens_small/SIFT_predictions --sub-results ** Checking query data and substitutions files ** * processing queries: 100.00/100.00% * ** Searching database for candidate sequences **
Any suggestions that anyone can offer will be much appreciated! Thank you!!!
Katie
Before installing sift4g, enter gcc -v to make sure gcc is 10.3.0, then run the command.
This issue is closed. Thank you @abcdefghijklmn97 for your help.
@ksolari has opened a new separate thread. (It's very hard to keep track of issues when the same person is posting the same issue multiple times)
Hi, When I tried to use the Partial Homo sapiens example to build a database, I encountered some weird issues. It ran successfully, but I didn't get any results in the/ folder .
The command line and end-of-run interface look like this:
(sift4g) [root@localhost SIFT4G_Create_Genomic_DB-master]# perl make-SIFT-db-all.pl -config test_files/homo_sapiens-test.txt entered mkdir ./test_files/homo_sapiens_small/GRCh38.83 converting gene format to use-able input done converting gene format making single records file done making single records template making noncoding records file done making noncoding records make the fasta sequences done making the fasta sequences start siftsharp, getting the alignments sift4g -d /nfs/LJH/zhushi/plants/nr.plant.fa -q ./test_files/homo_sapiens_small/all_prot.fasta --subst ./test_files/homo_sapiens_small/subst --out ./test_files/homo_sapiens_small/SIFT_predictions --sub-results Checking query data and substitutions files
Searching database for candidate sequences
This is the result file structure:
In addition to the examples, I had the same problem running my own data.
This is the homo_sapiens-test.txt
GENETIC_CODE_TABLE=1 GENETIC_CODE_TABLENAME=Standard MITO_GENETIC_CODE_TABLE=2 MITO_GENETIC_CODE_TABLENAME=Vertebrate Mitochondrial
PARENT_DIR=./test_files/homo_sapiens_small ORG=homo_sapiens ORG_VERSION=GRCh38.83 DBSNP_VCF_FILE=Homo_sapiens.vcf.gz
Running SIFT 4G
SIFT4G_PATH=sift4g PROTEIN_DB=/nfs/LJH/zhushi/plants/nr.plant.fa
Sub-directories, don't need to change
GENE_DOWNLOAD_DEST=gene-annotation-src CHR_DOWNLOAD_DEST=chr-src LOGFILE=Log.txt ZLOGFILE=Log2.txt FASTA_DIR=fasta SUBST_DIR=subst ALIGN_DIR=SIFT_alignments SIFT_SCORE_DIR=SIFT_predictions SINGLE_REC_BY_CHR_DIR=singleRecords SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores DBSNP_DIR=dbSNP
Doesn't need to change
FASTA_LOG=fasta.log INVALID_LOG=invalid.log PEPTIDE_LOG=peptide.log ENS_PATTERN=ENS SINGLE_RECORD_PATTERN=:change:_aa1valid_dbsnp.singleRecord
This is the chr1D.txt of my data
GENETIC_CODE_TABLE=1 GENETIC_CODE_TABLENAME=Standard
PARENT_DIR=/nfs/LJH/TEST/sift/JYM/Chr1D
ORG=JYM
ORG_VERSION=chr1D
Running SIFT 4G
SIFT4G_PATH=sift4g
PROTEIN_DB=/mnt/SOFT/nr
Sub-directories, don't need to change
GENE_DOWNLOAD_DEST=gene-annotation-src CHR_DOWNLOAD_DEST=chr-src LOGFILE=Log.txt ZLOGFILE=Log2.txt FASTA_DIR=fasta SUBST_DIR=subst ALIGN_DIR=SIFT_alignments SIFT_SCORE_DIR=SIFT_predictions SINGLE_REC_BY_CHR_DIR=singleRecords SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores
Doesn't need to change
FASTA_LOG=fasta.log INVALID_LOG=invalid.log PEPTIDE_LOG=peptide.log
I would greatly appreciate your assistance in resolving this issue. If possible, I would like to ask you to check where I might have made a mistake or why I am unable to obtain the result files. If you could provide some guidance or suggestions, I would be very grateful.
Thank you very much!
Best wishes! Jinhua Long