XIAO2Mark commented 2 years ago

Hi,

I have tried to use the SIFT4G to build my own SIFT database for many times, but it still was unsuccessful. can you help me to check it? The details as bellow:

Searching database for candidate sequences

processing database part 1 (size ~0.25 GB): 100.00/100.00% * Aligning queries with candidate sequences

processing database part 1 (size ~1.00 GB): 100.00/100.00% * Selecting alignments with median threshold: 2.75

processing queries: 100.00/100.00% * Generating SIFT predictions with sequence identity: 100.00%

processing queries: 100.00/100.00% * done getting all the scores populating databases cat: /home/scripts_to_build_SIFT_db/test_files/singleRecords/Chr1.singleRecords: No such file or directory can't open /home/scripts_to_build_SIFT_db/test_files/singleRecords/Chr1.singleRecords at map-scores-back-to-records.pl line 122. Unable to read from /home/scripts_to_build_SIFT_db/test_files/singleRecords_with_scores/Chr1_scores.Srecords cat: /home/scripts_to_build_SIFT_db/test_files/singleRecords/Chr1.singleRecords_noncoding.with_dbSNPid: No such file or directory Traceback (most recent call last): File "make_regions_file.py", line 68, in get_regions (chrom_file, out_file) File "make_regions_file.py", line 31, in get_regions pos = get_pos (first_line) File "make_regions_file.py", line 8, in get_pos return int (fields[0]) ValueError: invalid literal for int() with base 10: '' rm: cannot remove '/home/scripts_to_build_SIFT_db/test_files/singleRecords_with_scores/Chr1_scores.Srecords': No such file or directory

many thx.

pauline-ng commented 2 years ago

Are you able to make a database from the test file?

If yes, then can you paste below

a list all the directories and files generated in /home/scripts_to_build_SIFT_db/test_files/
your config file

XIAO2Mark commented 2 years ago

Hi Ng,

thank you so much!

No, i can not make the database when i use the example file. Details as bellow,

done making the fasta sequences start siftsharp, getting the alignments cat: './test_files/homo_sapiens_small/fasta/*.fasta': No such file or directory /bigdrive/sift4g/bin/sift4g -d /bigdrive/SIFT_databases/uniprot_sprot.fasta -q ./test_files/homo_sapiens_small/all_prot.fasta --subst ./test_files/homo_sapiens_small/subst --out ./test_files/homo_sapiens_small/SIFT_predictions --sub-results

Additionally, the PROTEIN_DB is the file that I download from UniProt database (wget https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref100/uniref100.fasta.gz)

Running SIFT 4G

SIFT4G_PATH=/home/bin/sift4g PROTEIN_DB=/SIFT_databases/uniprot_sprot.fasta

could you pls help me to check it again? Many thx.

pauline-ng commented 2 years ago

In the config file, please change to full paths. SIFT does not work with relative paths.

XIAO2Mark commented 2 years ago

thx. i followed your suggestions but it still does not work.

done making the fasta sequences start siftsharp, getting the alignments cat: '/home/SIFT/scripts_to_build_SIFT_db/ET/fasta/*.fasta': No such file or directory Checking query data and substitutions files EXITING! No valid queries to process.

pauline-ng commented 2 years ago

This is the output of the test file? Can you list the files that were generated in the test file directory and their sizes?

Also, please resend your config file.

XIAO2Mark commented 2 years ago

yes, the details as bellow,

homo_sapiens_small/* -shc 0 homo_sapiens_small/all_prot.fasta 46M homo_sapiens_small/chr-src 25M homo_sapiens_small/dbSNP 4.0K homo_sapiens_small/fasta 0 homo_sapiens_small/fasta.log 13M homo_sapiens_small/gene-annotation-src 4.0K homo_sapiens_small/GRCh38.83 0 homo_sapiens_small/invalid.log 0 homo_sapiens_small/Log2.txt 0 homo_sapiens_small/peptide.log 4.0K homo_sapiens_small/SIFT_alignments 4.0K homo_sapiens_small/SIFT_predictions 4.0K homo_sapiens_small/singleRecords 4.0K homo_sapiens_small/singleRecords_with_scores 4.0K homo_sapiens_small/subst 83M total

here is the config file. thx homo_sapiens-test.txt

pauline-ng commented 2 years ago

can you list all files & their sizes in each directory?

XIAO2Mark commented 2 years ago

├── [ 839] arabidopsis_config.txt ├── [1.3K] candidatus_carsonella_ruddii_pv_config.txt ├── [4.0K] homo_sapiens_small │ ├── [ 0] all_prot.fasta │ ├── [4.0K] chr-src │ │ ├── [ 12K] directory.index │ │ ├── [ 45M] Homo_sapiens.GRCh38.dna.chromosome.21.fa │ │ └── [ 17K] Homo_sapiens.GRCh38.dna.chromosome.MT.fa │ ├── [4.0K] dbSNP │ │ └── [ 24M] Homo_sapiens_trimmed.vcf.gz │ ├── [4.0K] fasta │ ├── [ 0] fasta.log │ ├── [4.0K] gene-annotation-src │ │ ├── [514K] Homo_sapiens.GRCh38.83_trimmed.gtf.gz │ │ └── [ 12M] Homo_sapiens.GRCh38.pep.all.fa.gz │ ├── [4.0K] GRCh38.83 │ ├── [ 0] invalid.log │ ├── [ 0] Log2.txt │ ├── [ 0] peptide.log │ ├── [4.0K] SIFT_alignments │ ├── [4.0K] SIFT_predictions │ ├── [4.0K] singleRecords │ ├── [4.0K] singleRecords_with_scores │ └── [4.0K] subst ├── [ 883] homo_sapiens-test.txt └── [1.3K] saccharomyces_cerevisiae-template.txt

pauline-ng commented 2 years ago

That's weird, it's not even going through the first step. Can you paste everything that shows up on the terminal when you run the command?

pauline-ng commented 2 years ago

And I assume you have perl and python installed?

XIAO2Mark commented 2 years ago

yes, i have installed perl and python. i copied the genome sequence (genome.fa) to the file (/home/SIFT/scripts_to_build_SIFT_db/ET/fasta/*.fasta'.), and now it's working. I am now waiting for the final results.

pauline-ng commented 2 years ago

Great!

XIAO2Mark commented 2 years ago

Many thx, Ng. it is still running now. May i ask if the file in ''homo_sapiens_small/fasta/'' folder is the protein sequences or genome sequences? Also, the file all_prot.fasta represented the protein sequences, right?

many thanks.

pauline-ng commented 2 years ago

chr-src/ should contain DNA sequence of the genome fasta/ should contain protein sequence. This is generated from chr-src and gene-annotation-src all_prot.fasta should be all of the protein sequences in the genome. It comes from combining all of the protein sequences in fasta/

pauline-ng / SIFT4G_Create_Genomic_DB

Can not build my own SIFT database #69

Running SIFT 4G