pauline-ng / SIFT4G_Create_Genomic_DB

Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
25 stars 7 forks source link

Problem creating genomic database for new organism_1 #45

Closed AlisaGU closed 2 years ago

AlisaGU commented 3 years ago

Hi, Pauline

"In the gtf file, make sure the 9th column (attribute column) says gene_biotype "protein_coding;" for rows which are labelled as exon, CDS, stop_codon, and start_codon." For each protein, you'll need to annotate exon, CDS, stop_codon, and start_codon.

I have modified my gtf file to satisfy these criterions, but errors still exists.

converting gene format to use-able input
done converting gene format
making single records file
done making single records template
making noncoding records file
done making noncoding records
make the fasta sequences
done making the fasta sequences
start siftsharp, getting the alignments
cat: /picb/evolgen/users/gushanshan/projects/probiotics/multiple_whole_genome_alignment/pairwise_combined/formalExper/homologousGroup/PS128_B21_BLS41_comparation/pairwise_multiz/mutationEffectPrediction/formal/sift4g_ps128_database/fasta/*.fasta: No such file or directory
/picb/evolgen/users/gushanshan/software/sift/sift4/bin/sift4g -d /picb/evolgen/users/gushanshan/database/uniref90/uniref90 -q /picb/evolgen/users/gushanshan/projects/probiotics/multiple_whole_genome_alignment/pairwise_combined/formalExper/homologousGroup/PS128_B21_BLS41_comparation/pairwise_multiz/mutationEffectPrediction/formal/sift4g_ps128_database/all_prot.fasta --subst /picb/evolgen/users/gushanshan/projects/probiotics/multiple_whole_genome_alignment/pairwise_combined/formalExper/homologousGroup/PS128_B21_BLS41_comparation/pairwise_multiz/mutationEffectPrediction/formal/sift4g_ps128_database/subst --out /picb/evolgen/users/gushanshan/projects/probiotics/multiple_whole_genome_alignment/pairwise_combined/formalExper/homologousGroup/PS128_B21_BLS41_comparation/pairwise_multiz/mutationEffectPrediction/formal/sift4g_ps128_database/SIFT_predictions --sub-results 
** Checking query data and substitutions files **

** EXITING! No valid queries to process. **

By the way, I managed to run test files for human. Here is the directory structure:

-bash-4.2$ ls -lhtR
.:
total 39M
drwxr-xr-x 2 gushanshan evolgen  143 Apr 14 21:41 gene-annotation-src
-rw-r--r-- 1 gushanshan evolgen   68 Apr 14 21:26 fasta.log
-rw-r--r-- 1 gushanshan evolgen  544 Apr 14 21:26 invalid.log
-rw-r--r-- 1 gushanshan evolgen    0 Apr 14 21:26 peptide.log
drwxr-xr-x 2 gushanshan evolgen  101 Apr 14 21:26 chr-src
-rw-r--r-- 1 gushanshan evolgen    0 Apr 14 21:26 Log2.txt
drwxr-xr-x 2 gushanshan evolgen  389 Apr 14 21:25 singleRecords
-rw-r--r-- 1 gushanshan evolgen 1.3K Apr 14 14:34 sift4g_ps128_database_configure.txt
-rw-r--r-- 1 gushanshan evolgen    0 Apr 14 14:07 all_prot.fasta
drwxr-xr-x 2 gushanshan evolgen 3.2K Apr 14 14:07 subst
drwxr-xr-x 2 gushanshan evolgen    0 Apr 14 14:04 SIFT_predictions
drwxr-xr-x 2 gushanshan evolgen    0 Apr 14 14:04 singleRecords_with_scores
drwxr-xr-x 2 gushanshan evolgen    0 Apr 14 14:04 SIFT_alignments
drwxr-xr-x 2 gushanshan evolgen    0 Apr 14 14:04 fasta
drwxr-xr-x 2 gushanshan evolgen    0 Apr 14 14:04 LP_PS128
-rw-r--r-- 1 gushanshan evolgen  24M Apr 13 21:44 uniprot_sprot_species
drwxr-xr-x 2 gushanshan evolgen    0 Apr 13 21:28 dbSNP
-rw-r----- 1 gushanshan evolgen 3.5M Apr 13 21:26 ps128.gtf

./gene-annotation-src:
total 6.9M
-rw-r--r-- 1 gushanshan evolgen 229K Apr 14 21:25 noncoding.txt
-rw-r--r-- 1 gushanshan evolgen  145 Apr 14 21:25 protein_coding_genes.txt
-rw-r--r-- 1 gushanshan evolgen 4.0M Apr 14 21:18 ps128_format.gtf

./chr-src:
total 4.9M
-rw-r--r-- 1 gushanshan evolgen 1.0K Apr 14 21:26 directory.index.pag
-rw-r--r-- 1 gushanshan evolgen    0 Apr 14 21:26 directory.index.dir
-rw-r----- 1 gushanshan evolgen 3.3M Apr 13 21:25 ps128.fna

./singleRecords:
total 1.8G
-rw-r--r-- 1 gushanshan evolgen 1.3G Apr 14 21:26 NZ_LBHS01000002.1.singleRecords_noncoding
-rw-r--r-- 1 gushanshan evolgen 5.7M Apr 14 21:25 NZ_LBHS01000001.1.singleRecords_noncoding
-rw-r--r-- 1 gushanshan evolgen    0 Apr 14 21:25 NZ_LBHS01000002.1.singleRecords
-rw-r--r-- 1 gushanshan evolgen  16K Apr 14 16:36 NZ_LBHS01000001.1.singleRecords_proteins.fa
-rw-r--r-- 1 gushanshan evolgen  15K Apr 14 16:36 NZ_LBHS01000001.1.invalidRecords
-rw-r--r-- 1 gushanshan evolgen 4.8K Apr 14 16:36 NZ_LBHS01000002.1.singleRecords_proteins.fa
-rw-r--r-- 1 gushanshan evolgen 4.6K Apr 14 16:36 NZ_LBHS01000002.1.invalidRecords

./subst:
total 220K
-rw-r--r-- 1 gushanshan evolgen 0 Apr 14 16:36 WH27_RS00095.subst
... a lot of .subst files
-rw-r--r-- 1 gushanshan evolgen 0 Apr 14 16:36 WH27_RS03910.subst

./SIFT_predictions:
total 0

./singleRecords_with_scores:
total 0

./SIFT_alignments:
total 0

./fasta:
total 0

./LP_PS128:
total 0

./dbSNP:
total 0

genome, gtf for creating database and some logs are deposited in supplementary files.

Could you give me some tips to avoid this situation?

Thanks,

Shanshan

AlisaGU commented 3 years ago

It seems that supplementary file can't be upload due to the network speed.

If possible, could I have your email address and email this file to you?

AlisaGU commented 3 years ago

supp.zip

Woo-hoo, it's done.

yinhongwei4079 commented 3 years ago

@AlisaGU have you solved it? I have the same question when i run the test for human. no error, no warning and database can't be created.

pauline-ng commented 3 years ago

If your gtf's are coming from a standard data source I can look into it. (But we typically don't support customized genomes, unless it's a collaboration or paid consulting.)

AlisaGU commented 3 years ago

@AlisaGU have you solved it? I have the same question when i run the test for human. no error, no warning and database can't be created.

No, I turned to other softwares, like provean and SnpEff.