pauline-ng / SIFT4G_Create_Genomic_DB

Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
21 stars 7 forks source link

No error message but nothing in results file #87

Open ksolari opened 9 months ago

ksolari commented 9 months ago

I'm having this issue as well. I've updated my gcc (gcc (GCC) 10.3.0), reinstalled sift4g, and have made sure that I have loaded the updated gcc version before running, but it still does not create a results file.

Here is the command I'm running: perl make-SIFT-db-all.pl -config test_files/homo_sapiens-test.txt

The homo_sapiens-test.txt file:

GENETIC_CODE_TABLE=1
GENETIC_CODE_TABLENAME=Standard
MITO_GENETIC_CODE_TABLE=2
MITO_GENETIC_CODE_TABLENAME=Vertebrate Mitochondrial

PARENT_DIR=./test_files/homo_sapiens_small
ORG=homo_sapiens
ORG_VERSION=GRCh38.83
DBSNP_VCF_FILE=Homo_sapiens.vcf.gz

#Running SIFT 4G
SIFT4G_PATH=/oak/stanford/groups/dpetrov/ksolari/SIFT/sift4g/bin/sift4g
PROTEIN_DB=/oak/stanford/groups/dpetrov/ksolari/SIFT/uniref90.fasta

# Sub-directories, don't need to change
GENE_DOWNLOAD_DEST=gene-annotation-src
CHR_DOWNLOAD_DEST=chr-src
LOGFILE=Log.txt
ZLOGFILE=Log2.txt
FASTA_DIR=fasta
SUBST_DIR=subst
ALIGN_DIR=SIFT_alignments
SIFT_SCORE_DIR=SIFT_predictions
SINGLE_REC_BY_CHR_DIR=singleRecords
SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores
DBSNP_DIR=dbSNP

# Doesn't need to change
FASTA_LOG=fasta.log
INVALID_LOG=invalid.log
PEPTIDE_LOG=peptide.log
ENS_PATTERN=ENS
SINGLE_RECORD_PATTERN=:change:_aa1valid_dbsnp.singleRecord

The screen output:

converting gene format to use-able input
done converting gene format
making single records file
done making single records template
making noncoding records file
done making noncoding records
make the fasta sequences
done making the fasta sequences
start siftsharp, getting the alignments
/oak/stanford/groups/dpetrov/ksolari/SIFT/sift4g/bin/sift4g -d /oak/stanford/groups/dpetrov/ksolari/SIFT/uniref90.fasta -q ./test_files/homo_sapiens_small/all_prot.fasta --subst ./test_files/homo_sapiens_small/subst --out ./test_files/homo_sapiens_small/SIFT_predictions --sub-results
** Checking query data and substitutions files **
* processing queries: 100.00/100.00% *

** Searching database for candidate sequences **

contents of homo_sapiens_small output directory:

-rw-rw----+ 1 ksolari oak_dpetrov 362181 Sep 19 20:19 all_prot.fasta
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 19 20:16 chr-src
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 14 12:05 dbSNP
drwxrws---+ 2 ksolari oak_dpetrov  69632 Sep 18 11:09 fasta
-rw-rw----+ 1 ksolari oak_dpetrov     73 Sep 19 20:19 fasta.log
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 gene-annotation-src
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 GRCh38.83
-rw-rw----+ 1 ksolari oak_dpetrov  62026 Sep 19 20:19 invalid.log
-rw-rw----+ 1 ksolari oak_dpetrov      0 Sep 19 20:16 Log2.txt
-rw-rw----+ 1 ksolari oak_dpetrov 362434 Sep 19 20:19 peptide.log
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 SIFT_alignments
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 SIFT_predictions
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 19 20:16 singleRecords
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 singleRecords_with_scores
drwxrws---+ 2 ksolari oak_dpetrov 102400 Sep 18 11:09 subst

folder sizes:

4.0K    ./SIFT_alignments
8.4G    ./singleRecords
4.0K    ./SIFT_predictions
4.0K    ./singleRecords_with_scores
46M ./chr-src
4.0K    ./GRCh38.83
3.4M    ./fasta
25M ./dbSNP
21M ./subst
14M ./gene-annotation-src
8.5G    .

Any suggestions that anyone can offer will be much appreciated! Thank you!!!

Katie

abcdefghijklmn97 commented 9 months ago

I'm having this issue as well. I've updated my gcc (gcc (GCC) 10.3.0), reinstalled sift4g, and have made sure that I have loaded the updated gcc version before running, but it still does not create a results file.

Here is the command I'm running: perl make-SIFT-db-all.pl -config test_files/homo_sapiens-test.txt

The homo_sapiens-test.txt file:

GENETIC_CODE_TABLE=1
GENETIC_CODE_TABLENAME=Standard
MITO_GENETIC_CODE_TABLE=2
MITO_GENETIC_CODE_TABLENAME=Vertebrate Mitochondrial

PARENT_DIR=./test_files/homo_sapiens_small
ORG=homo_sapiens
ORG_VERSION=GRCh38.83
DBSNP_VCF_FILE=Homo_sapiens.vcf.gz

#Running SIFT 4G
SIFT4G_PATH=/oak/stanford/groups/dpetrov/ksolari/SIFT/sift4g/bin/sift4g
PROTEIN_DB=/oak/stanford/groups/dpetrov/ksolari/SIFT/uniref90.fasta

# Sub-directories, don't need to change
GENE_DOWNLOAD_DEST=gene-annotation-src
CHR_DOWNLOAD_DEST=chr-src
LOGFILE=Log.txt
ZLOGFILE=Log2.txt
FASTA_DIR=fasta
SUBST_DIR=subst
ALIGN_DIR=SIFT_alignments
SIFT_SCORE_DIR=SIFT_predictions
SINGLE_REC_BY_CHR_DIR=singleRecords
SINGLE_REC_WITH_SIFTSCORE_DIR=singleRecords_with_scores
DBSNP_DIR=dbSNP

# Doesn't need to change
FASTA_LOG=fasta.log
INVALID_LOG=invalid.log
PEPTIDE_LOG=peptide.log
ENS_PATTERN=ENS
SINGLE_RECORD_PATTERN=:change:_aa1valid_dbsnp.singleRecord

The screen output:

converting gene format to use-able input
done converting gene format
making single records file
done making single records template
making noncoding records file
done making noncoding records
make the fasta sequences
done making the fasta sequences
start siftsharp, getting the alignments
/oak/stanford/groups/dpetrov/ksolari/SIFT/sift4g/bin/sift4g -d /oak/stanford/groups/dpetrov/ksolari/SIFT/uniref90.fasta -q ./test_files/homo_sapiens_small/all_prot.fasta --subst ./test_files/homo_sapiens_small/subst --out ./test_files/homo_sapiens_small/SIFT_predictions --sub-results
** Checking query data and substitutions files **
* processing queries: 100.00/100.00% *

** Searching database for candidate sequences **

contents of homo_sapiens_small output directory:

-rw-rw----+ 1 ksolari oak_dpetrov 362181 Sep 19 20:19 all_prot.fasta
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 19 20:16 chr-src
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 14 12:05 dbSNP
drwxrws---+ 2 ksolari oak_dpetrov  69632 Sep 18 11:09 fasta
-rw-rw----+ 1 ksolari oak_dpetrov     73 Sep 19 20:19 fasta.log
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 gene-annotation-src
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 GRCh38.83
-rw-rw----+ 1 ksolari oak_dpetrov  62026 Sep 19 20:19 invalid.log
-rw-rw----+ 1 ksolari oak_dpetrov      0 Sep 19 20:16 Log2.txt
-rw-rw----+ 1 ksolari oak_dpetrov 362434 Sep 19 20:19 peptide.log
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 SIFT_alignments
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 SIFT_predictions
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 19 20:16 singleRecords
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 singleRecords_with_scores
drwxrws---+ 2 ksolari oak_dpetrov 102400 Sep 18 11:09 subst

folder sizes:

4.0K  ./SIFT_alignments
8.4G  ./singleRecords
4.0K  ./SIFT_predictions
4.0K  ./singleRecords_with_scores
46M   ./chr-src
4.0K  ./GRCh38.83
3.4M  ./fasta
25M   ./dbSNP
21M   ./subst
14M   ./gene-annotation-src
8.5G  .

Any suggestions that anyone can offer will be much appreciated! Thank you!!!

Katie

Before installing sift4g, enter gcc -v to make sure gcc is 10.3.0, then run the command.

pauline-ng commented 9 months ago

Is there anything in the SIFT_predictions or SIFT_alignments folder?

ksolari commented 9 months ago

I've reinstalled sift4g after making sure that I had gcc 10.3.0 running and I'm still having the same issue. The SIFT_prediction and SIFT_alignments folders are both empty as well.

Thank you so much for your help troubleshooting this. I really appreciate it.

abcdefghijklmn97 commented 9 months ago

I've reinstalled sift4g after making sure that I had gcc 10.3.0 running and I'm still having the same issue. The SIFT_prediction and SIFT_alignments folders are both empty as well.

Thank you so much for your help troubleshooting this. I really appreciate it.

SIFT4G_Create_Genomic_DB needs Python3, sift4g needs g++ (4.9+). My server is centos, my whole resolution process is as follows: yum install centos-release-scl yum install devtoolset-11-gcc* scl enable devtoolset-11 bash source /opt/rh/devtoolset-11/enable gcc -v git clone --recursive https://github.com/rvaser/sift4g.git sift4g cd sift4g/ make gcc -v git clone https://github.com/pauline-ng/SIFT4G_Create_Genomic_DB.git scripts_to_build_SIFT_db gcc -v

Modify some paths in the homo_sapiens-test .txt

perl make-SIFT-db-all.pl -config homo_sapiens-test.txt

pauline-ng commented 9 months ago

Please change your PARENT_DIR in the config file to a full path, not a relative path.

ksolari commented 9 months ago

I have changed the PARENT_DIR: PARENT_DIR=/oak/stanford/groups/dpetrov/ksolari/SIFT/scripts_to_build_SIFT_db/test_files/homo_sapiens_small and the only thing generated is all_prot.fasta -

-rw-rw----+ 1 ksolari oak_dpetrov 362181 Sep 22 15:29 all_prot.fasta
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 19 20:16 chr-src
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 14 12:05 dbSNP
drwxrws---+ 2 ksolari oak_dpetrov  69632 Sep 18 11:09 fasta
-rw-rw----+ 1 ksolari oak_dpetrov     73 Sep 19 20:19 fasta.log
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 gene-annotation-src
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 GRCh38.83
-rw-rw----+ 1 ksolari oak_dpetrov  62026 Sep 19 20:19 invalid.log
-rw-rw----+ 1 ksolari oak_dpetrov      0 Sep 19 20:16 Log2.txt
-rw-rw----+ 1 ksolari oak_dpetrov 362434 Sep 19 20:19 peptide.log
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 SIFT_alignments
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 SIFT_predictions
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 19 20:16 singleRecords
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 singleRecords_with_scores
drwxrws---+ 2 ksolari oak_dpetrov 102400 Sep 18 11:09 subst
ksolari commented 9 months ago

Update from my post above- I got an error message on the previous run: DBI.c: loadable library and perl binaries are mismatched (got first handshake key 0xde00080, needed 0xeb80080)

So I loaded a different version of perl and ran it again and no longer got that error message. This time when I ran the test, more files were generated:

-rw-rw----+ 1 ksolari oak_dpetrov 362181 Sep 22 15:56 all_prot.fasta
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 22 15:53 chr-src
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 14 12:05 dbSNP
drwxrws---+ 2 ksolari oak_dpetrov  69632 Sep 18 11:09 fasta
-rw-rw----+ 1 ksolari oak_dpetrov     73 Sep 22 15:56 fasta.log
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 gene-annotation-src
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 GRCh38.83
-rw-rw----+ 1 ksolari oak_dpetrov  62026 Sep 22 15:56 invalid.log
-rw-rw----+ 1 ksolari oak_dpetrov      0 Sep 22 15:53 Log2.txt
-rw-rw----+ 1 ksolari oak_dpetrov 362434 Sep 22 15:56 peptide.log
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 SIFT_alignments
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 SIFT_predictions
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 22 15:53 singleRecords
drwxrws---+ 2 ksolari oak_dpetrov   4096 Sep 18 10:16 singleRecords_with_scores
drwxrws---+ 2 ksolari oak_dpetrov 102400 Sep 18 11:09 subst
pauline-ng commented 9 months ago

Are there any files in SIFT_predictions?

ksolari commented 9 months ago

I just double checked - no, there are no files in that folder.

pauline-ng commented 9 months ago

I just released a Dockerfile to help users with installation problems. Can you try that?

ksolari commented 9 months ago

I'm running this on the university server and it does not support Docker. It does support singularity. It looks like there should be a way to install it through singularity, but I'm having a hard time. If you have any insights into this, any tips you can share will be much appreciated. I'm really grateful for your continued support.