Closed XIAO2Mark closed 2 years ago
Are you able to make a database from the test file?
If yes, then can you paste below
Hi Ng,
thank you so much!
No, i can not make the database when i use the example file. Details as bellow,
done making the fasta sequences start siftsharp, getting the alignments cat: './test_files/homo_sapiens_small/fasta/*.fasta': No such file or directory /bigdrive/sift4g/bin/sift4g -d /bigdrive/SIFT_databases/uniprot_sprot.fasta -q ./test_files/homo_sapiens_small/all_prot.fasta --subst ./test_files/homo_sapiens_small/subst --out ./test_files/homo_sapiens_small/SIFT_predictions --sub-results
Additionally, the PROTEIN_DB is the file that I download from UniProt database (wget https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref100/uniref100.fasta.gz)
SIFT4G_PATH=/home/bin/sift4g PROTEIN_DB=/SIFT_databases/uniprot_sprot.fasta
could you pls help me to check it again? Many thx.
In the config file, please change to full paths. SIFT does not work with relative paths.
thx. i followed your suggestions but it still does not work.
done making the fasta sequences start siftsharp, getting the alignments cat: '/home/SIFT/scripts_to_build_SIFT_db/ET/fasta/*.fasta': No such file or directory Checking query data and substitutions files EXITING! No valid queries to process.
This is the output of the test file? Can you list the files that were generated in the test file directory and their sizes?
Also, please resend your config file.
yes, the details as bellow,
homo_sapiens_small/* -shc 0 homo_sapiens_small/all_prot.fasta 46M homo_sapiens_small/chr-src 25M homo_sapiens_small/dbSNP 4.0K homo_sapiens_small/fasta 0 homo_sapiens_small/fasta.log 13M homo_sapiens_small/gene-annotation-src 4.0K homo_sapiens_small/GRCh38.83 0 homo_sapiens_small/invalid.log 0 homo_sapiens_small/Log2.txt 0 homo_sapiens_small/peptide.log 4.0K homo_sapiens_small/SIFT_alignments 4.0K homo_sapiens_small/SIFT_predictions 4.0K homo_sapiens_small/singleRecords 4.0K homo_sapiens_small/singleRecords_with_scores 4.0K homo_sapiens_small/subst 83M total
here is the config file. thx homo_sapiens-test.txt
can you list all files & their sizes in each directory?
├── [ 839] arabidopsis_config.txt ├── [1.3K] candidatus_carsonella_ruddii_pv_config.txt ├── [4.0K] homo_sapiens_small │ ├── [ 0] all_prot.fasta │ ├── [4.0K] chr-src │ │ ├── [ 12K] directory.index │ │ ├── [ 45M] Homo_sapiens.GRCh38.dna.chromosome.21.fa │ │ └── [ 17K] Homo_sapiens.GRCh38.dna.chromosome.MT.fa │ ├── [4.0K] dbSNP │ │ └── [ 24M] Homo_sapiens_trimmed.vcf.gz │ ├── [4.0K] fasta │ ├── [ 0] fasta.log │ ├── [4.0K] gene-annotation-src │ │ ├── [514K] Homo_sapiens.GRCh38.83_trimmed.gtf.gz │ │ └── [ 12M] Homo_sapiens.GRCh38.pep.all.fa.gz │ ├── [4.0K] GRCh38.83 │ ├── [ 0] invalid.log │ ├── [ 0] Log2.txt │ ├── [ 0] peptide.log │ ├── [4.0K] SIFT_alignments │ ├── [4.0K] SIFT_predictions │ ├── [4.0K] singleRecords │ ├── [4.0K] singleRecords_with_scores │ └── [4.0K] subst ├── [ 883] homo_sapiens-test.txt └── [1.3K] saccharomyces_cerevisiae-template.txt
That's weird, it's not even going through the first step. Can you paste everything that shows up on the terminal when you run the command?
And I assume you have perl and python installed?
yes, i have installed perl and python. i copied the genome sequence (genome.fa) to the file (/home/SIFT/scripts_to_build_SIFT_db/ET/fasta/*.fasta'.), and now it's working. I am now waiting for the final results.
Great!
Many thx, Ng. it is still running now. May i ask if the file in ''homo_sapiens_small/fasta/'' folder is the protein sequences or genome sequences? Also, the file all_prot.fasta represented the protein sequences, right?
many thanks.
chr-src/ should contain DNA sequence of the genome fasta/ should contain protein sequence. This is generated from chr-src and gene-annotation-src all_prot.fasta should be all of the protein sequences in the genome. It comes from combining all of the protein sequences in fasta/
Hi,
I have tried to use the SIFT4G to build my own SIFT database for many times, but it still was unsuccessful. can you help me to check it? The details as bellow:
Searching database for candidate sequences
processing database part 1 (size ~0.25 GB): 100.00/100.00% * Aligning queries with candidate sequences
processing database part 1 (size ~1.00 GB): 100.00/100.00% * Selecting alignments with median threshold: 2.75
processing queries: 100.00/100.00% * Generating SIFT predictions with sequence identity: 100.00%
processing queries: 100.00/100.00% * done getting all the scores populating databases cat: /home/scripts_to_build_SIFT_db/test_files/singleRecords/Chr1.singleRecords: No such file or directory can't open /home/scripts_to_build_SIFT_db/test_files/singleRecords/Chr1.singleRecords at map-scores-back-to-records.pl line 122. Unable to read from /home/scripts_to_build_SIFT_db/test_files/singleRecords_with_scores/Chr1_scores.Srecords cat: /home/scripts_to_build_SIFT_db/test_files/singleRecords/Chr1.singleRecords_noncoding.with_dbSNPid: No such file or directory Traceback (most recent call last): File "make_regions_file.py", line 68, in get_regions (chrom_file, out_file) File "make_regions_file.py", line 31, in get_regions pos = get_pos (first_line) File "make_regions_file.py", line 8, in get_pos return int (fields[0]) ValueError: invalid literal for int() with base 10: '' rm: cannot remove '/home/scripts_to_build_SIFT_db/test_files/singleRecords_with_scores/Chr1_scores.Srecords': No such file or directory
many thx.