pauline-ng / SIFT4G_Create_Genomic_DB

Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
21 stars 7 forks source link

Errors while populating the databases; all files within singleRecords_with_scores folder removed #53

Closed Arynio closed 2 years ago

Arynio commented 2 years ago

Dear Pauline,

I'm running into some error messages (see at the end of this post) while creating my own SIFT database. The strangest thing is that I could see that files were being created in the singleRecords_with_scores folder, and they looked good, but after the programme is done running all contents from this folder are wiped out. I've attempted it once more with higher RAM in case memory was the problem, but I ran into the exact same issue (plus this time I also got warnings because all .fa files in the chr-src folder were already gzipped).

I can confirm there are neither "X" nor "*" characters (nor duplicate entries) in the all_prot.fasta file. However there are quite a few N characters in the .fa files (in case that matters).

Sorry if this issue has been reported before; I found two instances with similar error code, but they didn't help me.

Thank you very much for any help you may provide.

Best,

Daniel.


Error code:

populating databases Traceback (most recent call last): File "make_regions_file.py", line 68, in <module> get_regions (chrom_file, out_file) File "make_regions_file.py", line 31, in get_regions pos = get_pos (first_line) File "make_regions_file.py", line 8, in get_pos return int (fields[0]) ValueError: invalid literal for int() with base 10: '' Traceback (most recent call last): File "make_regions_file.py", line 68, in <module> get_regions (chrom_file, out_file) File "make_regions_file.py", line 31, in get_regions pos = get_pos (first_line) File "make_regions_file.py", line 8, in get_pos return int (fields[0]) ValueError: invalid literal for int() with base 10: '' Traceback (most recent call last): File "make_regions_file.py", line 68, in <module> get_regions (chrom_file, out_file) File "make_regions_file.py", line 31, in get_regions pos = get_pos (first_line) File "make_regions_file.py", line 8, in get_pos return int (fields[0]) ValueError: invalid literal for int() with base 10: '' Traceback (most recent call last): File "make_regions_file.py", line 68, in <module> get_regions (chrom_file, out_file) File "make_regions_file.py", line 31, in get_regions pos = get_pos (first_line) File "make_regions_file.py", line 8, in get_pos return int (fields[0]) ValueError: invalid literal for int() with base 10: '' Traceback (most recent call last): File "make_regions_file.py", line 68, in <module> get_regions (chrom_file, out_file) File "make_regions_file.py", line 31, in get_regions pos = get_pos (first_line) File "make_regions_file.py", line 8, in get_pos return int (fields[0]) ValueError: invalid literal for int() with base 10: '' Traceback (most recent call last): File "make_regions_file.py", line 68, in <module> get_regions (chrom_file, out_file) File "make_regions_file.py", line 31, in get_regions pos = get_pos (first_line) File "make_regions_file.py", line 8, in get_pos return int (fields[0]) ValueError: invalid literal for int() with base 10: '' Traceback (most recent call last): File "make_regions_file.py", line 68, in <module> get_regions (chrom_file, out_file) File "make_regions_file.py", line 31, in get_regions pos = get_pos (first_line) File "make_regions_file.py", line 8, in get_pos return int (fields[0]) ValueError: invalid literal for int() with base 10: '' checking the databases zipping up /mnt/lustre/scratch/home/uvi/bg/dkr/sift4g_annotation/scripts_to_build_SIFT_db/test_files/nr_ancestral_dm6/chr-src/* All done!

pauline-ng commented 2 years ago

I've seen these errors before and they may not matter -- do you have databases with predictions? That's what matters. You can also check your *SIFTDB_stats.txt files.

If you don't have SIFT predictions:

1) Comment out line 151 in make-SIFT-db-all.pl , so that these intermediate files are not deleted

2) delete the file "all_prot.fasta"

3) Rerun.

Arynio commented 2 years ago

Dear Pauline,

I have both databases with predictions and *SIFTDB_stats.txt files, and I was able to successfully annotate my VCF files, so you were right!

Thank you for your help!!

Best,

Daniel.