pauline-ng / SIFT4G_Create_Genomic_DB

Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
21 stars 7 forks source link

error in building database #79

Open AmruthaJNC opened 1 year ago

AmruthaJNC commented 1 year ago

I was trying to create database for Candida tropicalis using genomic assembly fasta file and annotation file in gtf format. I created a config file named C_tropicalis_MYA3404.txt , which is located in test files folder. when i try to run the command : perl make-SIFT-db-all.pl -config test_files/C_tropicalis_MYA3404.txt

I'm getting the followng error:

entered mkdir /test_files/C_topicalis_MYA3404 No such file or directory at /home/mml/programs/scripts_to_build_SIFT_db/common-utils.pl line 80.

I'm quite new to this, so it might be some very obvious problem. Kindly let me know if i can fix this.

pauline-ng commented 1 year ago

What happens when you do:

ls /home/mml/programs/scripts_to_build_SIFT_db/ ?

Where is your directory SIFT4G_Create_Genomic_DB ?

AmruthaJNC commented 1 year ago

I gave the parent directory for the database as this PARENT_DIR=/test_files/C_topicalis_MYA3404 Do i have to make a directory with the name you mentioned? ls /home/mml/programs/scripts_to_build_SIFT_db/ gave me the following: Screenshot from 2023-03-31 19-00-01

pauline-ng commented 1 year ago

I think I see the issue. Check your config file.

Where it says /test_files/C_topicalis_MYA3404

replace it with /home/mml//test_files/C_topicalis_MYA3404

You want a full path that you have write access to.

AmruthaJNC commented 1 year ago

i changed the path and it went past that error. This is what i'm getting now. Screenshot from 2023-03-31 19-54-01

I'm sure it's because of my inexperience. Thank you for your time.

AmruthaJNC commented 1 year ago

Also, the PROTEIN_DB path in the config file under #running SIFT 4G, is it the same path to the database we are trying to create?

pauline-ng commented 1 year ago

Protein database is a database you download from Uniprot or NCBI.

I recommend you download the fasta file for UniRef90 Then set PROTEIN_DB to that path.

AmruthaJNC commented 1 year ago

I have added path to the downloaded uniref90.fasta.gz file. It has went past the earlier error. I'm getting this error now. Screenshot from 2023-04-13 11-29-46

pauline-ng commented 1 year ago

Please uncompress your fasta file, and then re-run.

gunzip uniref90.fasta

AmruthaJNC commented 1 year ago

i could run it till processing database. It showed 100% completion and took around 24 hours to complete. I could not see the 'populating database' part described in https://github.com/pauline-ng/SIFT4G_Create_Genomic_DB#annotate in the terminal. Also, the folders , SIFT_predictions and SingleRecords_with_scores are empty.

pauline-ng commented 1 year ago

Please paste your config file so I can debug

pauline-ng commented 1 year ago

Actually, can you run a test config? That would be a lot easier and faster to debug. It may be that it's already working (some directories are cleaned out at the end)

AmruthaJNC commented 1 year ago

i tried to run the homo sapiens test config. I'm getting an error like this. Screenshot from 2023-04-22 16-10-28 I don't remember seeing the 'populating database' part when i tried with my config file.

pauline-ng commented 1 year ago

Per the README instructions, install python3 which should be invoked by calling python.

AmruthaJNC commented 1 year ago

I could successfully complete the database creation from the homo sapiens test config. When i tried the same with my config file, it ends like this. Screenshot from 2023-04-25 10-51-35 The problem i mentioned earlier of empty folders persist. The Sift predictions folder for homo sapiens is not empty, whereas it is empty in my case after creating the database. Also, the fasta folder has a single file in my case whereas multiple ones in the case of homo sapiens.

pauline-ng commented 1 year ago

Did you exit the program and stop it from running?

The test config uses test files that should run in less than 15 minutes. But when I build a full genome, it can take 24-48 hours.

AmruthaJNC commented 1 year ago

I did not stop it in between. It took almost 24 hours to reach here.

AmruthaJNC commented 1 year ago

I just tried running the annotator with my vcf files. The database creation stopped midway, if i'm not wrong. But i'm getting this error when i run the annotator. Screenshot from 2023-04-26 12-16-53

The vcf file was obtained after joint Genotype calling using GenotypeGVCF tool from GATK.

pauline-ng commented 1 year ago

Uncompress your .vcf.gz file

AmruthaJNC commented 1 year ago

I'm facing this error since the database was not created completely Screenshot from 2023-04-26 17-10-27 I'm not sure why the database creation stopped abruptly.