shohei-kojima / MEGAnE

MEGAnE
MIT License
24 stars 3 forks source link

Error for empty reshaped_repbase.fa #2

Closed piosierra closed 2 years ago

piosierra commented 2 years ago

I get an error running call_genotype when the script finds reshaped_repbase.fa empty. I am running it with empty files for -repremove and -pA_ME. Could that be the issue?

2022-04-27 13:25:29,259:DEBUG:file=1_indiv_call_genotype.py:module=1_indiv_call_genotype:funcName=:line=78:message=Logging started. 2022-04-27 13:25:29,280:DEBUG:file=1_indiv_call_genotype.py:module=1_indiv_call_genotype:funcName=:line=83:message=This is /usr/local/bin/MEGAnE/1_indiv_call_genotype.py version v1.0.0 2022/03/14 2022-04-27 13:25:29,280:INFO:file=1_indiv_call_genotype.py:module=1_indiv_call_genotype:funcName=:line=85:message=Initial check started. 2022-04-27 13:25:29,280:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=33:message=started 2022-04-27 13:25:29,280:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=35:message=command line: /usr/local/bin/MEGAnE/1_indiv_call_genotype.py -i ../datos/cichl187579993.mem.crumble.cram -fa ref/GCA_900246225.3_fAstCal1.2_genomic_chromnames.fa -fadb ../rds/rds-durbin-group-8b3VcZwY7rY/projects/cichlid/pio/projects/20220315_get_insertion_sites_from_bam/databases/AstCaldb -mk megane_kmer_set/AstCal1.2.mk -rep ../rds/rds-durbin-group-8b3VcZwY7rY/projects/cichlid/pio/projects/20220314_FishTEDB_pggb_graph/Mylandia_zebra_FishTEDB_full.fa -repout /home/pio/rds/rds-durbin-group-8b3VcZwY7rY/projects/cichlid/Bettina/data/transposable_elements/repeatMasker/GCA_900246225.3_fAstCal1.2_rm.out -repremove remove_TE -pA_ME polyA -mainchr chrlist -sex unknown -female_sex_chr chr13 2022-04-27 13:25:29,280:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=39:message=Python version=3.7.3 2022-04-27 13:25:29,281:DEBUG:file=initial_check.py:module=initial_check:funcName=which:line=28:message=blastn found: /usr/local/bin/ncbi-blast-2.12.0+/bin/blastn 2022-04-27 13:25:29,281:DEBUG:file=initial_check.py:module=initial_check:funcName=which:line=28:message=bedtools found: /opt/conda/bin/bedtools 2022-04-27 13:25:29,281:DEBUG:file=initial_check.py:module=initial_check:funcName=which:line=28:message=samtools found: /usr/local/bin/samtools-1.14/samtools 2022-04-27 13:25:29,281:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=76:message=ref/GCA_900246225.3_fAstCal1.2_genomic_chromnames.fa found. 2022-04-27 13:25:29,281:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=76:message=../rds/rds-durbin-group-8b3VcZwY7rY/projects/cichlid/pio/projects/20220314_FishTEDB_pggb_graph/Mylandia_zebra_FishTEDB_full.fa found. 2022-04-27 13:25:29,314:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=76:message=/home/pio/rds/rds-durbin-group-8b3VcZwY7rY/projects/cichlid/Bettina/data/transposable_elements/repeatMasker/GCA_900246225.3_fAstCal1.2_rm.out found. 2022-04-27 13:25:29,314:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=76:message=chrlist found. 2022-04-27 13:25:29,314:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=76:message=remove_TE found. 2022-04-27 13:25:29,314:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=76:message=polyA found. 2022-04-27 13:25:29,315:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=85:message=/usr/local/bin/MEGAnE/cpp/extract_discordant.so found. 2022-04-27 13:25:29,315:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=85:message=/usr/local/bin/MEGAnE/cpp/extract_unmapped.so found. 2022-04-27 13:25:29,315:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=85:message=/usr/local/bin/MEGAnE/cpp/remove_multimapping_reads_from_fa.so found. 2022-04-27 13:25:29,315:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=85:message=/usr/local/bin/MEGAnE/cpp/convert_rep_to_2bit_k11.so found. 2022-04-27 13:25:29,315:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=94:message=megane_kmer_set/AstCal1.2.mk found. 2022-04-27 13:25:29,315:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=99:message=megane_kmer_set/AstCal1.2.mi found. 2022-04-27 13:25:29,406:DEBUG:file=initial_check.py:module=initial_check:funcName=check:line=110:message=../datos/cichl187579993.mem.crumble.cram was able to open. 2022-04-27 13:25:29,820:WARNING:file=auto_setting.py:module=auto_setting:funcName=init:line=22:message=Sex is not specified. MEGAnE consider all sex chromosomes as diploid. This option is NOT recommended. Please use this option at your own risk. Please specify sex whenever possible. 2022-04-27 13:25:29,820:DEBUG:file=auto_setting.py:module=auto_setting:funcName=estimate_readlen:line=27:message=started 2022-04-27 13:25:29,834:DEBUG:file=auto_setting.py:module=auto_setting:funcName=estimate_readlen:line=41:message=avelen=151;lens=151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151,151 2022-04-27 13:25:29,834:INFO:file=auto_setting.py:module=auto_setting:funcName=estimate_readlen:line=42:message=estimated read lenth = 151 2022-04-27 13:25:29,835:DEBUG:file=setup.py:module=setup:funcName=setup:line=17:message=started 2022-04-27 13:25:29,841:DEBUG:file=setup.py:module=setup:funcName=setup:line=128:message=249 chromosome(s) were found in ../datos/cichl187579993.mem.crumble.cram. 2022-04-27 13:25:29,841:INFO:file=setup.py:module=setup:funcName=setup:line=141:message=All 22 main chromosome(s) were found in ../datos/cichl187579993.mem.crumble.cram. 2022-04-27 13:25:29,841:INFO:file=setup.py:module=setup:funcName=setup:line=146:message="chr13" was found in ../datos/cichl187579993.mem.crumble.cram. "chr13" will be considered as a female sex chromosome. 2022-04-27 13:25:29,841:WARNING:file=setup.py:module=setup:funcName=setup:line=159:message=Sex chromosome {'chrY', 'Y'} was NOT found in ../datos/cichl187579993.mem.crumble.cram. Male sex chromosome will not be analyzed. Will continue anyway. 2022-04-27 13:25:29,843:DEBUG:file=load_parameters.py:module=load_parameters:funcName=init:line=15:message=started 2022-04-27 13:25:29,844:DEBUG:file=load_parameters.py:module=load_parameters:funcName=init:line=214:message=parameters: read_count_for_readlen_estimation=200 chr1_start_depth_est=112500000 chr1_end_depth_est=112600000 chrX_start_depth_est=20000000 chrX_end_depth_est=20100000 chrY_start_depth_est=6900000 chrY_end_depth_est=7000000 sex_est_XY_ratio_threshold=0.3 sex_est_XY_ratio_threshold_for_nochrY=0.75 discordant_reads_clip_len=20 read_pair_gap_len=2000 max_TSD_len=50 polyA_overhang_threshold=0.7 mapped_region_low_complex_threshold=0.7 abs_min_dist=50 abs_max_dist=20000 blastn_evalue=1e-05 blastn_ident=80 blastn_word_size=11 overhang_evalue_threshold=1e-05 gzip_compresslevel=1 pA_scan_bin=12 max_non_pA_count=2 scan_loop_from_edge=5 max_ref_genome_hits_for_unmapped=20 repbase_seq_slide_bin=5 min_read_num_per_breakpoint_edge=1 max_breakpoint_gap=50 ref_TE_slop_len=0 retrieve_mapped_seq_threshold=3 blastn_evalue_for_mapped=1e-05 blastn_ident_for_mapped=95 blastn_word_size_for_mapped=30 mapped_abs_single_ident_threshold=98 hybrid_read_range_from_breakpint=500 hybrid_read_coeff_for_gaussian_fitting=0.1 chimeric_read_coeff_for_gaussian_fitting=0.01 eval_threshold_for_gaussian_fitting=1e-25 fit_gaussian_init_a_coeff=0.5 fit_gaussian_init_mu_coeff=1.0 fit_gaussian_init_sigma_coeff=0.33 fit_gaussian_CI_alpha=0.99 actual_cutoff_rank=0.001 first_filter_eval_threshold=1e-15 first_filter_total_hybrid_read_num=1 second_filter_hybrid_read_num=1 second_filter_eval_threshold_for_few_hybrid=1e-25 L1_filter_min_TSD_len=5 L1_filter_A_or_T_perc=50 L1_filter_A_plus_T_perc=90 L1_filter_eval_threshold=1e-25 abs_min_chimeric_num_coeff=0.03 breakpoint_annotation_gap=25 abs_len_to_te_ratio=0.9 len_te_for_abs_ratio=0.9 non_ME_len_ratio=0.5 transduction_pA_len=12 transduction_pA_ratio=0.75 length_for_3transduction_search=1000 2022-04-27 13:25:29,847:INFO:file=1_indiv_call_genotype.py:module=1_indiv_call_genotype:funcName=:line=183:message=Preprocessing started. 2022-04-27 13:25:29,847:DEBUG:file=reshape_rep.py:module=reshape_rep:funcName=reshape_repout_to_bed:line=151:message=started 2022-04-27 13:25:45,381:DEBUG:file=reshape_rep.py:module=reshape_rep:funcName=reshape:line=19:message=started 2022-04-27 13:25:47,286:INFO:file=reshape_rep.py:module=reshape_rep:funcName=reshape:line=49:message=N=2388 repeats found in ../rds/rds-durbin-group-8b3VcZwY7rY/projects/cichlid/pio/projects/20220314_FishTEDB_pggb_graph/Mylandia_zebra_FishTEDB_full.fa. N=2388 will be analyzed. N=0 will be excluded due to non-ME repeats. 2022-04-27 13:25:47,303:DEBUG:file=reshape_rep.py:module=reshape_rep:funcName=reshape:line=64:message=2388 MEs with ambiguous subclass found. 2022-04-27 13:25:47,330:DEBUG:file=blastn.py:module=blastn:funcName=makeblastdb:line=53:message=started 2022-04-27 13:25:47,945:ERROR:file=blastn.py:module=blastn:funcName=makeblastdb:line=57:message= Traceback (most recent call last): File "/usr/local/bin/MEGAnE/scripts/blastn.py", line 55, in makeblastdb NcbimakeblastdbCommandline(input_file=fasta_file, dbtype='nucl', out=dbpath, parse_seqids=True)() File "/opt/conda/lib/python3.7/site-packages/Bio/Application/init.py", line 569, in call raise ApplicationError(return_code, str(self), stdout_str, stderr_str) Bio.Application.ApplicationError: Non-zero return code 1 from 'makeblastdb -out ./result_out/repdb -dbtype nucl -in ./result_out/reshaped_repbase.fa -parse_seqids', message 'BLAST options error: File ./result_out/reshaped_repbase.fa is empty'

2022-04-27 13:25:47,948:ERROR:file=reshape_rep.py:module=reshape_rep:funcName=reshape:line=93:message= Traceback (most recent call last): File "/usr/local/bin/MEGAnE/scripts/blastn.py", line 55, in makeblastdb NcbimakeblastdbCommandline(input_file=fasta_file, dbtype='nucl', out=dbpath, parse_seqids=True)() File "/opt/conda/lib/python3.7/site-packages/Bio/Application/init.py", line 569, in call raise ApplicationError(return_code, str(self), stdout_str, stderr_str) Bio.Application.ApplicationError: Non-zero return code 1 from 'makeblastdb -out ./result_out/repdb -dbtype nucl -in ./result_out/reshaped_repbase.fa -parse_seqids', message 'BLAST options error: File ./result_out/reshaped_repbase.fa is empty'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/MEGAnE/scripts/reshape_rep.py", line 70, in reshape blastn.makeblastdb(filenames.reshaped_rep, filenames.repdb) File "/usr/local/bin/MEGAnE/scripts/blastn.py", line 58, in makeblastdb exit(1) File "/opt/conda/lib/python3.7/_sitebuiltins.py", line 26, in call raise SystemExit(code) SystemExit: 1

shohei-kojima commented 2 years ago

I am sorry for this error.

Empty files for -repremove and -pA_ME should not cause this.

I guess fasta headers in your file "Mylandia_zebra_FishTEDB_full.fa" are not compatible with MEGAnE. Currently MEGAnE can take a repeat library downloaded from RepBase or prepared from Dfam database. How did you prepare the library? I will make a tutorial or implement a code if you want to use a library from other sources/database.

piosierra commented 2 years ago

You are right, I just read your instructions to prepare the library from the Dfam file. I will use that now and check for potential differences I find with the library I had. Thanks.

piosierra commented 2 years ago

Confirmed. That was the issue. It is running properly now. Thanks!