Closed frankiefattorini closed 2 years ago
Hi,
dbSNP VCF file is optional. Weird-- so your database has been built.
When you ls test_files/output/dicty_2.7
, what do you see? Do you see the regions file?
Also, I'm not familiar with Dictyosteliad discoideum. In your VCF file, the chromosome is DDB0232430 ? Or under the #CHROM column, does it say 3 or chr3?
Best, Pauline
Hi,
with ls test_files/output/dicty_2.7
I get CHECK_GENES.LOG DDB0232430.gz dictyostelium_discoideum_config.txt
. The chromosome is DDB0232430 yes, not 3 or chr3.
Very odd. Can you list the directories and their contents? Specifically, I'm interested in
Also, when please confirm zcat test_files/output/dicty_2.7/DDB0232430.gz
shows predictions (it's not an empty file)
In the folder test_files/output/dicty_2.7
, there should also be a DDB0232430_SIFTDB
_stats.txt and a DDB0232430.regions
, not sure why these files are missing.
I can't find directories with those names but within test_files/output there are directories 'singleRecords' and 'singleRecords_with_scores'. singleRecords_with_scores is empty while singleRecords contains:
DDB0232430.invalidRecords DDB0232430.singleRecords DDB0232430.singleRecords_noncoding DDB0232430.singleRecords_noncoding.with_dbSNPid DDB0232430.singleRecords_proteins.fa
Running zcat test_files/output/dicty_2.7/DDB0232430.gz
does show predictions.
I'm not sure if this is relevant but Dictyostelium discoideum is a haploid organism. However, the region within which my genes of interest are located looks like it may have undergone a duplication in some strains, so vcf files for these strains contian apparently heterozygous SNPs. Could this pose a problem?
I don't think that would pose a problem because SIFT works on plenty of diploid organisms. We tested it and used it extensively on humans.
The command
perl make-sift-scores-db-batch.pl $metafile
did not complete for some reason. This program creates the region file and the stats file. It's a super simple script, not sure why it didn't finish.
Unfortunately, I don' think you can just run the command again because there was some cleanup at the end, so input files are probably missing.
Can you rerun making the database, but comment out the lines that remove files (it'll help me debug)
comment out the following lines:
line 157 in make-SIFT-db-all.pl:
Original: system ("rm $rm_dir");
New: # system ("rm $rm_dir");
lines 69 -70 in make-sift-scores-db-batch.pl Add "#" in front of the system ("rm ...."); commands
Then go ahead and remake the database.
Assuming it still doesn't finish, I would like to know the stdout of perl make-sift-scores-db-batch.pl $metafile
Thank you. Just to check, when you say comment out do you mean I should edit the text of the files where you specify, before rerunning the database making script?
Yes, that's right. the "#" will make these lines comments, so they won't run, the files won't be removed, they will be kept.
I want you to keep all the files, it makes it easier to rerun parts of SIFT and debug.
Hope that's clear.
I've rerun the database creation after turning those rm lines into comments, then I ran perl make-sift-scores-db-batch.pl $metafile
. The output was:
Executing perl map-scores-back-to-records.pl test_files/dictyostelium_discoideum_config.txt DDB0232430
Executing perl match-replace-dbSNP-vcf.pl test_files/dictyostelium_discoideum_config.txt DDB0232430
db file test_files/output/singleRecords_with_scores/DDB0232430_scores.Srecords.with_dbSNPid
noncoding test_files/output/singleRecords/DDB0232430.singleRecords_noncoding.with_dbSNPidfinal outfolder is test_files/output/dicty_2.7
python make_regions_file.py test_files/output/singleRecords_with_scores/DDB0232430_scores.Srecords.with_dbSNPid.sorted test_files/output/dicty_2.7/DDB0232430.regions
Can't exec "python": No such file or directory at make-sift-scores-db-batch.pl line 66.
Built single records test_files/output/singleRecords
I believe I've solved the issue. I've been using Ubuntu as a Linux portal for Windows and it seems Ubuntu uses the python3
command rather than the python
command. As a result, make-sift-scores-db-batch.pl
could not excecute python
. I installed python-is-python3
and reran the VCF annotation. Now the output is:
Chromosome WithSIFT4GAnnotations WithoutSIFT4GAnnotations Progress
DDB0232430 450 0 Completed : 1/1
Thank you for your help
Excellent!
Hello,
I've created a SIFT database for chromosome 3 of Dictyostelium discoideum using fasta and gff (converted to gtf) files. Running
more CHECK_GENES.LOG
gave the output:Chr Genes with SIFT Scores Pos with SIFT scores Pos with Confident Scores
DDB0232430 83 (3798/4567) 91 (14959896/16519059) 74(10997369/14959896)
When attempting to annotate vcf files containing SNPs within my genes of interest, however, I received the output:Started Running .......
Running in Single transcripts mode
Chromosome WithSIFT4GAnnotations WithoutSIFT4GAnnotations Progress
test_files/output/dicty_2.7/DDB0232430.regions does not exist
DDB0232430 0 450 Completed : 1/1
Both of my genes of interest seem to have been included in the various files produced for the alignment. I did not include the vcf file during in the 'dbSNP' file during the database creation process, so could this be the issue? If so, is there a way to create the SIFT database using several VCF files or a merged VCF file containing SNP information for different strains of an organism with different SNPs within the same gene sequence, as I wish to look at a few hundred VCF files and the SNP effects on protein structure. Thank you