pauline-ng / SIFT4G_Create_Genomic_DB

Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
25 stars 7 forks source link

SIFT4G_Create_Genomic_DB #61

Closed PrincyJohnson closed 2 years ago

PrincyJohnson commented 2 years ago

Hello,

I tried with the test files. It's not working either.

princy_raagul@AG053360D-07006:/mnt/c/Users/josephine.p.johnson/Documents/Variant_dataset/SIFT/scripts_to_build_SIFT_db$ perl make-SIFT-db-all.pl -config test_files101/homo_sapiens-test.txt entered mkdir ./test_files101/homo_sapiens_small/ /GRCh38.83dir ./test_files101/homo_sapiens_small/ converting gene format to use-able input ls: cannot access '/gene-annotation-src': No such file or directory Unable to open for reading done converting gene format /*.gz: No such file or directoryns_small/ DNA files do not exist or did not unzip properly

homo_sapiens-test.txt ot unzip properly

pauline-ng commented 2 years ago

Full paths must be used in the config files. Please change all the settings in the config file to full paths.

PrincyJohnson commented 2 years ago

Hello Pauline,

I am using full paths for all the files. It is downloading the files but it says can't access the gene-annotation-src file candidatus_carsonella_ruddii_pv_config.txt

(base) princy_raagul@AG053360D-07006:/mnt/c/new/scripts_to_build_SIFT_db$ perl make-SIFT-db-all.pl -config test_files101/candidatus_carsonella_ruddii_pv_config.txt --ensembl_download entered mkdir /mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100 /ASM1036v1.34 /mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100 downloading gene annotation /gene-annotation-src: Scheme missing. --2022-07-08 15:14:45-- ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/gtf//bacteria_11_collection/candidatus_carsonella_ruddii_pv/Candidatus_carsonella_ruddii_pv.ASM1036v1.34.gtf.gz => ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100/Candidatus_carsonella_ruddii_pv.ASM1036v1.34.gtf.gz’ Resolving ftp.ensemblgenomes.org (ftp.ensemblgenomes.org)... 193.62.193.141 Connecting to ftp.ensemblgenomes.org (ftp.ensemblgenomes.org)|193.62.193.141|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/bacteria/release-34/gtf//bacteria_11_collection/candidatus_carsonella_ruddii_pv ... done. ==> SIZE Candidatus_carsonella_ruddii_pv.ASM1036v1.34.gtf.gz ... 14570 ==> PASV ... done. ==> RETR Candidatus_carsonella_ruddii_pv.ASM1036v1.34.gtf.gz ... done. Length: 14570 (14K) (unauthoritative)

Candidatus_carsonella_ruddii_pv.ASM1036v1.34.gtf.gz 100%[========================================================================================================================================>] 14.23K --.-KB/s in 0.001s

2022-07-08 15:14:47 (16.1 MB/s) - ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100/Candidatus_carsonella_ruddii_pv.ASM1036v1.34.gtf.gz’ saved [14570]

FINISHED --2022-07-08 15:14:47-- Total wall clock time: 1.9s Downloaded: 1 files, 14K in 0.001s (16.1 MB/s) /gene-annotation-src: Scheme missing. --2022-07-08 15:14:47-- ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/pep/Candidatus_carsonella_ruddii_pv.ASM1036v1.34.pep.all.fa.gz => ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100/Candidatus_carsonella_ruddii_pv.ASM1036v1.34.pep.all.fa.gz’ Resolving ftp.ensemblgenomes.org (ftp.ensemblgenomes.org)... 193.62.193.141 Connecting to ftp.ensemblgenomes.org (ftp.ensemblgenomes.org)|193.62.193.141|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/pep ... done. ==> SIZE Candidatus_carsonella_ruddii_pv.ASM1036v1.34.pep.all.fa.gz ... done.

==> PASV ... done. ==> RETR Candidatus_carsonella_ruddii_pv.ASM1036v1.34.pep.all.fa.gz ... No such file ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.34.pep.all.fa.gz’.

done downloading gene annotation downloading fasta files --2022-07-08 15:14:49-- ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna/%0D => ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/.listing’ Resolving ftp.ensemblgenomes.org (ftp.ensemblgenomes.org)... 193.62.193.141 Connecting to ftp.ensemblgenomes.org (ftp.ensemblgenomes.org)|193.62.193.141|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna ... done. ==> PASV ... done. ==> LIST ... done.

.listing [ <=> ] 1009 --.-KB/s in 0s

2022-07-08 15:14:50 (33.4 MB/s) - ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/.listing’ saved [1009]

Removed ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/.listing’. Rejecting ‘CHECKSUMS’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna.toplevel.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna_rm.chromosome.Chromosome.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna_rm.toplevel.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna_sm.chromosome.Chromosome.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna_sm.toplevel.fa.gz’. Rejecting ‘README’. --2022-07-08 15:14:50-- ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna/%0D => ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/%0D’ ==> CWD not required. ==> SIZE \r ... done.

==> PASV ... done. ==> RETR \r ... No such file ‘\r’.

--2022-07-08 15:14:50-- ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna/%0D => ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/.listing’ Resolving ftp.ensemblgenomes.org (ftp.ensemblgenomes.org)... 193.62.193.141 Connecting to ftp.ensemblgenomes.org (ftp.ensemblgenomes.org)|193.62.193.141|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna ... done. ==> PASV ... done. ==> LIST ... done.

.listing [ <=> ] 1009 --.-KB/s in 0s

2022-07-08 15:14:52 (37.6 MB/s) - ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/.listing’ saved [1009]

Removed ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/.listing’. Rejecting ‘CHECKSUMS’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna.chromosome.Chromosome.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna.toplevel.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna_rm.chromosome.Chromosome.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna_rm.toplevel.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna_sm.chromosome.Chromosome.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna_sm.toplevel.fa.gz’. Rejecting ‘README’. --2022-07-08 15:14:52-- ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna/%0D => ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/%0D’ ==> CWD not required. ==> SIZE \r ... done.

==> PASV ... done. ==> RETR \r ... No such file ‘\r’.

--2022-07-08 15:14:52-- ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna/%0D => ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/.listing’ Resolving ftp.ensemblgenomes.org (ftp.ensemblgenomes.org)... 193.62.193.141 Connecting to ftp.ensemblgenomes.org (ftp.ensemblgenomes.org)|193.62.193.141|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna ... done. ==> PASV ... done. ==> LIST ... done.

.listing [ <=> ] 1009 --.-KB/s in 0s

2022-07-08 15:14:53 (45.6 MB/s) - ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/.listing’ saved [1009]

Removed ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/.listing’. Rejecting ‘CHECKSUMS’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna.chromosome.Chromosome.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna.toplevel.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna_rm.chromosome.Chromosome.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna_rm.toplevel.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna_sm.chromosome.Chromosome.fa.gz’. Rejecting ‘Candidatus_carsonella_ruddii_pv.ASM1036v1.dna_sm.toplevel.fa.gz’. Rejecting ‘README’. --2022-07-08 15:14:53-- ftp://ftp.ensemblgenomes.org/pub/bacteria/release-34/fasta//bacteria_11_collection/candidatus_carsonella_ruddii_pv/dna/%0D => ‘/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/%0D’ ==> CWD not required. ==> SIZE \r ... done.

==> PASV ... done. ==> RETR \r ... No such file ‘\r’.

done downloading DNA fasta sequencesdownload dbSNP files Use of uninitialized value $src_site in concatenation (.) or string at download-dbSNP-files.pl line 55. Use of uninitialized value $src_site in concatenation (.) or string at download-dbSNP-files.pl line 60. /dbSNP: Scheme missing. Use of uninitialized value $src_site in concatenation (.) or string at download-dbSNP-files.pl line 64. converting gene format to use-able input ls: cannot access '/gene-annotation-src': No such file or directory gzip: /mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100 is a directory -- ignored gzip: /gene-annotation-src.gz: No such file or directory gzip: /Candidatus_carsonella_ruddii_pv.ASM1036v1.34.gtf.gz: No such file or directory done converting gene format /*.gz: No such file or directoryd_SIFT_db/test_files101/candidas100 DNA files do not exist or did not unzip properly

pauline-ng commented 2 years ago

Are you working in Unix or a Mac? The bioinformatics tools were developed and tested in Unix. Because I'm seeing some special characters in your config file like "\r" and "%0D" in:


/mnt/c/new/scripts_to_build_SIFT_db/test_files101/candidas100\r/chr-src\r/%0D’
/candidatus_carsonella_ruddii_pv/dna/%0D

The other thing is -- your paths aren't what I expected. I'd expect all your full paths to start with /mnt or /home You have to have write permission to the folders in the config file, and unless you're running as root, I don't think you'd have permissions to create the folders in the config file (but I'm not familiar with Mac).

PrincyJohnson commented 2 years ago

Hi pauline,

I managed to clear that error and I ran for homo sapiens 21 tutorial. I getting this error. I can see sift predictions in the output file. But the database folder doesn't have the id.regions file. can you please let me know what it is. image

pauline-ng commented 2 years ago

Hi Princy,

You need to have python (v3) installed.

-Pauline

PrincyJohnson commented 2 years ago

I actually got that pauline.

Thank you so much for your response.