pauline-ng / SIFT4G_Create_Genomic_DB

Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
21 stars 7 forks source link

chr naming #88

Closed pauline-ng closed 9 months ago

pauline-ng commented 9 months ago

@mb47

Please continue your issue here.

pauline-ng commented 9 months ago

Copied from #23

Micha Bayer wrote: Hi,

* There should be 10 columns in the VCF file. The SIFT info will be added to the 8th column (the INFO column)

* Does your VCF file have a header?

* Is the VCF file tab-delimited?

* The database files should not start with "chr"

Pauline

Hi Pauline,

I am having issues with the chromosome naming - I am building a custom database and my chromosomes are named e.g. "chr1H", "chr2H", etc. -- this applies to all my relevant files (FASTA, VCF, GFF). How can I get around this issue? Is there an option when building the sift database to specify the format of the chromosome names? I couldn't find one.

thank you!


Pauline wrote:

The program should run OK with "chr" in it -- we tested this with human GRCh38 which has "chr".

The program cleans out "chr" for the filenames, but when you use the SIFT annotation jar program, it will also ignore "chr". So basically proceed and everything should work out.


Micha Bayer wrote:

Hi Pauline,

thanks for the swift reply. As it stands, sift is not producing any annotations, and I am getting the following error message (sorry, I forgot to include this):

The following chromosomes (or scaffolds/contigs) are not found in the SIFT 4G database and will not be annotated: 3H Please contact us if you have any questions. /mnt/shared/scratch/mbayer/apps/sift/scripts_to_build_SIFT_db/test_files/Bowman/3H.regions does not exist 3H 0 9418 Completed : 1/1


Pauline wrote:

try renaming your files: chr3H.gz to 3H.gz chr3H.regions to 3H.regions

and see if that works.

Also, can you paste your contents of chr3H_SIFTDB_stats.txt here so I can see it worked.

mb47 commented 9 months ago

Excellent -- renaming the files as described above has solved the problem, and I am getting SIFT annotations now. Many thanks for the super-quick help!

Here is the content of the stats file in case you still want to look at it:

./test_files/Bowman/Bowman_Bpgv2/chr3H_SIFTDB_stats.txt SYN 14404001 STOP-GAINED 993494 NONSYN 19322441

nonsyn/syn ratio: 1.34

All Counts: 30.7 3036642/9888294 dbSNP damaging: -1% 0/0 Not in dbSNP damaging: 30.7% 3036642/9888294

Reference predictions % damaging: 0.0 2022/8792834 % predicted on: 99.99744116353592 A C G T A -1 32.8 29.5 36.9 C 31.5 -1 26.7 27.3 G 27.1 26.6 -1 31.4 T 37.1 29.8 33.1 -1

#####################################

pauline-ng commented 9 months ago

This looks right.

Great, glad it's working for you!