Closed eyeamnice closed 4 years ago
Once you have all_prot.fasta made, you don't need to rerun lines 50-106 in make-SIFT-db-all.pl Just delete lines 50-106, and then run it.
Generating the SIFT predictions from SIFT 4G should be relatively fast; it's the pre-processing that is computationally expensive.
If it still takes a long time (more than 3 days), please check there aren't duplicates in all_prot.fasta (grep ">" all_prot.fasta | sort | uniq -d ) This would happen if you reran make-SIFT-db-all.pl without deleting the contents of the working folder. This would make SIFT longer to run.
Thank you Pauline. Should the compressed file for regions in the output directory contain SIFT scores? I looked at the results with headers:
#Position Ref_allele New_allele Transcript_id Gene_id Gene_name Region Ref_amino_acid New_amino_acid Position_of_amino_acid_substitution SIFT_score SIFT_median_sequence_info Num_seqs_at_positiondbSNP_id
I see that columns Position_of_amino_acid_substitution,SIFT_score and SIFT_median_sequence_info all contains NA. Is that normal or is something wrong? I only have 1.gz and 1.regions at the moment
Also which contents do I need to delete? I followed the exact directory naming structure as given in SIFT example.
Hi,
Go to https://github.com/pauline-ng/SIFT4G_Create_Genomic_DB Scroll to the bottom where it says "Monitoring the Database Creation Process"
Follow the commands under the 2nd column "Check the following." If any folders are empty, that will help pinpoint the problem.
Hi Pauline, I followed your directions and checked the database creation process. I think it was going good but it seem to have run out of space at this stage with error:
sort: write failed: /tmp/sort3Ipoki: No space left on device
Traceback (most recent call last):
File "make_regions_file.py", line 68, in <module>
get_regions (chrom_file, out_file)
File "make_regions_file.py", line 31, in get_regions
pos = get_pos (first_line)
File "make_regions_file.py", line 8, in get_pos
return int (fields[0])
ValueError: invalid literal for int() with base 10: ''
sort: write failed: /tmp/sortBthHCE: No space left on device
Traceback (most recent call last):
File "make_regions_file.py", line 68, in <module>
get_regions (chrom_file, out_file)
File "make_regions_file.py", line 31, in get_regions
pos = get_pos (first_line)
File "make_regions_file.py", line 8, in get_pos
return int (fields[0])
ValueError: invalid literal for int() with base 10: ''
sort: write failed: /tmp/sortQN1GPd: No space left on device
Traceback (most recent call last):
File "make_regions_file.py", line 68, in <module>
get_regions (chrom_file, out_file)
File "make_regions_file.py", line 31, in get_regions
pos = get_pos (first_line)
File "make_regions_file.py", line 8, in get_pos
return int (fields[0])
ValueError: invalid literal for int() with base 10: ''
sort: write failed: /tmp/sortnpz5qQ: No space left on device
Traceback (most recent call last):
File "make_regions_file.py", line 68, in <module>
get_regions (chrom_file, out_file)
File "make_regions_file.py", line 31, in get_regions
pos = get_pos (first_line)
File "make_regions_file.py", line 8, in get_pos
return int (fields[0])
ValueError: invalid literal for int() with base 10: ''
cat: /parent_dir/singleRecords//KZ847133.singleRecords: No such file or directory
can't open /parent_dir/singleRecords//KZ847133.singleRecords at map-scores-back-to-records.pl line 122.
Unable to read from /parent_dir/singleRecords_with_scores/KZ847133_scores.Srecords
What do you recommend I should do? Although it went through and zipped up the chr-src and printed Done at the end
You have no space left on your device. Please free up some GB.
Hi SIFT team,
I ran
cat all_prot.fasta | grep -v ">" | grep X
and I got some output containing XXX as shown below:What should I do next in order to continue building the database. It seem @pauline-ng previously mentioned that cat cmd, should not print any output, in this case. Do I have to start from the beginning? If so will the files already created be overwritten?
I am asking because it took a long time to reach this level. If not, how do I restart from where I am currently. I am stuck at:
I also wonder how many parts there is with uniref90? I divided the uniref90.fasta by 0.25, I got 188 but my sift4g run, indicated above, was processing part 200. Whats the correlation there? I had to run this for days to get to this point.