pauline-ng / SIFT4G_Create_Genomic_DB

Create genomic databases with SIFT predictions. Input is an organism's genomic DNA (.fa) file and the gene annotation file (.gtf). Output will be a database that can be used with SIFT4G_Annotator.jar to annotate VCF files.
GNU General Public License v3.0
25 stars 7 forks source link

MSG: Each line of the qual file must be less than 65,536 characters. Line 3 is 604927368 chars. #94

Closed fan040 closed 6 months ago

fan040 commented 6 months ago

Hello, I would like to ask if the error here is caused by my genome is too large, about 14Gb, and I cannot build my own database. "MSG: Each line of the qual file must be less than 65,536 characters. Line 3 is 604927368 chars. "I suspect this is the cause of the error,I would like to ask how I should operate to solve this problem.

(sift4G) perl make-SIFT-db-all.pl -config test_files/AK58V4MP.txt converting gene format to use-able input done converting gene format making single records file Possible precedence issue with control flow operator at /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm line 791.

------------- EXCEPTION ------------- STACK Bio::DB::IndexedBase::_check_linelength /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm:744 STACK Bio::DB::Fasta::_calculate_offsets /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/Fasta.pm:175 STACK Bio::DB::IndexedBase::_index_files /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm:648 STACK Bio::DB::IndexedBase::index_dir /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm:446 STACK Bio::DB::IndexedBase::new /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm:361 STACK main::generateOutput make-single-records-BIOPERL.pl:147 STACK toplevel make-single-records-BIOPERL.pl:105

done making single records template making noncoding records file Possible precedence issue with control flow operator at /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm line 791.

------------- EXCEPTION ------------- MSG: Each line of the qual file must be less than 65,536 characters. Line 3 is 604927368 chars. STACK Bio::DB::IndexedBase::_check_linelength /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm:744 STACK Bio::DB::Fasta::_calculate_offsets /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/Fasta.pm:175 STACK Bio::DB::IndexedBase::_index_files /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm:648 STACK Bio::DB::IndexedBase::index_dir /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm:446 STACK Bio::DB::IndexedBase::new /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm:361 STACK main::generateOutput make-single-records-noncoding.pl:62 STACK toplevel make-single-records-noncoding.pl:51

done making noncoding records make the fasta sequences Possible precedence issue with control flow operator at /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm line 791.

------------- EXCEPTION ------------- MSG: Each line of the qual file must be less than 65,536 characters. Line 3 is 604927368 chars. STACK Bio::DB::IndexedBase::_check_linelength /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm:744 STACK Bio::DB::Fasta::_calculate_offsets /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/Fasta.pm:175 STACK Bio::DB::IndexedBase::_index_files /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm:648 STACK Bio::DB::IndexedBase::index_dir /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm:446 STACK Bio::DB::IndexedBase::new /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm:361 STACK main::generateOutput generate-fasta-subst-files-BIOPERL.pl:374 STACK toplevel generate-fasta-subst-files-BIOPERL.pl:178

done making the fasta sequences start siftsharp, getting the alignments cat: '/cluster/home/fanrong/perl5/scripts_to_build_SIFT_db/test_files/AK58V4MP/fasta/*.fasta': 没有那个文件或目录 /cluster/home/fanrong/.conda/envs/sift4G/bin/sift4g -d /cluster/home/fanrong/perl5/scripts_to_build_SIFT_db/test_files/uniref90.fasta -q /cluster/home/fanrong/perl5/scripts_to_build_SIFT_db/test_files/AK58V4MP/all_prot.fasta --subst /cluster/home/fanrong/perl5/scripts_to_build_SIFT_db/test_files/AK58V4MP/subst --out /cluster/home/fanrong/perl5/scripts_to_build_SIFT_db/test_files/AK58V4MP/SIFT_predictions --sub-results Checking query data and substitutions files

EXITING! No valid queries to process.

pauline-ng commented 6 months ago

Sounds like you need some newlines in your genomic fasta sequence. Typically, there's a newline every 60 bases, do you have that? If not, please put them in.

fan040 commented 6 months ago

Thanks for your advice, my reference genome file is really not 60 base lines, I will fix this problem. Thank you for your help. Have a nice day

fan040 commented 6 months ago

@pauline-ng Hi, after I modify the reference genome, there is such a problem again, the code has been stuck in this step, is this normal? perl make-SIFT-db-all.pl -config test_files/AK58V4MP.txt converting gene format to use-able input done converting gene format making single records file Possible precedence issue with control flow operator at /cluster/home/fanrong/.conda/envs/sift4G/lib/perl5/site_perl/5.22.0/Bio/DB/IndexedBase.pm line 791.