Closed pauline-ng closed 9 months ago
Copied from #23
Micha Bayer wrote: Hi,
* There should be 10 columns in the VCF file. The SIFT info will be added to the 8th column (the INFO column)
* Does your VCF file have a header?
* Is the VCF file tab-delimited?
* The database files should not start with "chr"
Pauline
Hi Pauline,
I am having issues with the chromosome naming - I am building a custom database and my chromosomes are named e.g. "chr1H", "chr2H", etc. -- this applies to all my relevant files (FASTA, VCF, GFF). How can I get around this issue? Is there an option when building the sift database to specify the format of the chromosome names? I couldn't find one.
thank you!
Pauline wrote:
The program should run OK with "chr" in it -- we tested this with human GRCh38 which has "chr".
The program cleans out "chr" for the filenames, but when you use the SIFT annotation jar program, it will also ignore "chr". So basically proceed and everything should work out.
Micha Bayer wrote:
Hi Pauline,
thanks for the swift reply. As it stands, sift is not producing any annotations, and I am getting the following error message (sorry, I forgot to include this):
The following chromosomes (or scaffolds/contigs) are not found in the SIFT 4G database and will not be annotated: 3H Please contact us if you have any questions. /mnt/shared/scratch/mbayer/apps/sift/scripts_to_build_SIFT_db/test_files/Bowman/3H.regions does not exist 3H 0 9418 Completed : 1/1
Pauline wrote:
try renaming your files: chr3H.gz to 3H.gz chr3H.regions to 3H.regions
and see if that works.
Also, can you paste your contents of chr3H_SIFTDB_stats.txt here so I can see it worked.
Excellent -- renaming the files as described above has solved the problem, and I am getting SIFT annotations now. Many thanks for the super-quick help!
Here is the content of the stats file in case you still want to look at it:
./test_files/Bowman/Bowman_Bpgv2/chr3H_SIFTDB_stats.txt SYN 14404001 STOP-GAINED 993494 NONSYN 19322441
nonsyn/syn ratio: 1.34
All Counts: 30.7 3036642/9888294 dbSNP damaging: -1% 0/0 Not in dbSNP damaging: 30.7% 3036642/9888294
Reference predictions % damaging: 0.0 2022/8792834 % predicted on: 99.99744116353592 A C G T A -1 32.8 29.5 36.9 C 31.5 -1 26.7 27.3 G 27.1 26.6 -1 31.4 T 37.1 29.8 33.1 -1
#####################################
This looks right.
Great, glad it's working for you!
@mb47
Please continue your issue here.