merlyescalona / yagcloser

YAGCloser: Yet-Another-Gap-Closer based on spanning of long reads.
10 stars 0 forks source link

ERROR when getting output.potential.fillable.gaps.txt #3

Open huandna opened 1 year ago

huandna commented 1 year ago

hello,,I wanted to use yagcloser to fill gap in my genome( about 500MB) ,but get empty output.potential.fillable.gaps.txt,and get some record if no.support.gaps, I want to know why,Is it because I have a big genome?

here is my “yagcloser_output.no.support.gaps.txt”:

Chr2:38336522-38336622 100 Chr3:19743855-19743955 100 Chr3:29807753-29807853 100 Chr3:34631046-34631146 100 Chr3:35615854-35615954 100 Chr3:36136983-36137083 100 Chr3:36361212-36361312 100 Chr3:36450609-36450709 100

here is my log:

[03/03/2023 05:42:06 PM] WARNING (create_output_folders|865): Take into account that if these folders already exist files will be overwritten. [03/03/2023 05:42:06 PM] INFO (parse_bed_file|62): ================================================================================ [03/03/2023 05:42:06 PM] INFO (parse_bed_file|63): Reading BED file (/work4/Denovo/JZP202211GM039-01_carrot_wangyaoxin/12.yagcloser/Z21_170/gaps.bed) [03/03/2023 05:42:06 PM] INFO (parse_bed_file|64): ================================================================================ [03/03/2023 05:42:06 PM] INFO (parse_bed_file|86): Done reading BED file > 0:00:00.001071 [03/03/2023 05:42:06 PM] INFO (parse_reference_file|95): ================================================================================ [03/03/2023 05:42:06 PM] INFO (parse_reference_file|96): Reading scaffolds from reference file... [03/03/2023 05:42:06 PM] INFO (parse_reference_file|97): ================================================================================ [03/03/2023 05:42:08 PM] INFO (generate_flank_table|146): ================================================================================ [03/03/2023 05:42:08 PM] INFO (generate_flank_table|147): Extracting flank regions... [03/03/2023 05:42:08 PM] INFO (generate_flank_table|148): ================================================================================ [03/03/2023 05:42:08 PM] INFO (generate_flank_table|180): Done with flanks extraction: > 0:00:00.000091 [03/03/2023 05:42:08 PM] INFO (parse_reference_file|139): Done reading reference file (/work4/Denovo/JZP202211GM039-01_carrot_wangyaoxin/06.Mumer/Z21_170/final.fa) > 0:00:01.648593 [03/03/2023 05:42:08 PM] INFO (identify_potential_gaps|215): ================================================================================ [03/03/2023 05:42:08 PM] INFO (identify_potential_gaps|216): Identifying potential gaps [03/03/2023 05:42:08 PM] INFO (identify_potential_gaps|217): ================================================================================ [03/03/2023 05:42:13 PM] INFO (identify_potential_gaps|322): Done with the info extraction... > 0:00:05.947712 [03/03/2023 05:42:13 PM] INFO (identify_potential_gaps|326): ================================================================================ [03/03/2023 05:42:13 PM] INFO (identify_potential_gaps|327): Removing gaps with no support from analysis... [03/03/2023 05:42:13 PM] INFO (identify_potential_gaps|328): ================================================================================ [03/03/2023 05:42:13 PM] INFO (identify_potential_gaps|343): Removed 8 gaps.. > 0:00:00.000135 [03/03/2023 05:42:13 PM] INFO (identify_potential_gaps|344): -------------------------------------------------------------------------------- [03/03/2023 05:42:13 PM] INFO (identify_potential_gaps|345): Reporting removed gaps to file: /work4/Denovo/JZP202211GM039-01_carrot_wangyaoxin/12.yagcloser/Z21_170/yagcloser_output/yagcloser_output.no.support.gaps.txt [03/03/2023 05:42:14 PM] INFO (identify_potential_gaps|390): Reporting potential fillable gaps (/work4/Denovo/JZP202211GM039-01_carrot_wangyaoxin/12.yagcloser/Z21_170/yagcloser_output/yagcloser_output.potential.fillable.gaps.txt)... [03/03/2023 05:42:14 PM] INFO (extract_support_data|418): ================================================================================ [03/03/2023 05:42:14 PM] INFO (extract_support_data|419): Extracting support data [03/03/2023 05:42:14 PM] INFO (extract_support_data|420): ================================================================================ [03/03/2023 05:42:14 PM] INFO (extract_support_data|518): End of extracting support data > 0:00:00.007475 [03/03/2023 05:42:14 PM] INFO (extract_support_data|519): ================================================================================ [03/03/2023 05:42:14 PM] INFO (extract_support_data|520): Checking for ambiguos decisions... [03/03/2023 05:42:14 PM] INFO (extract_support_data|554): ================================================================================ [03/03/2023 05:42:14 PM] INFO (extract_support_data|555): Writing support files [03/03/2023 05:42:14 PM] INFO (extract_support_data|556): ================================================================================ [03/03/2023 05:42:14 PM] ERROR (|1083): Done with file writing... [03/03/2023 05:42:14 PM] INFO (|1087): Done with file writing...

JieZhouFighting commented 1 year ago

I use yagcloser to fill gap in my genome( about 200MB),but I also encountered the same problem, do you have a solution?Does this error mean that yagcloser cannot fill gaps because potential.fillable.gaps.txt is empty?

huandna commented 1 year ago

Thank you,bro, I try it as your advice ,but same error happened.

JieZhouFighting commented 1 year ago

Thank you,bro, I try it as your advice ,but same error happened. I read some codes, and I think the principle of yagcloser is to only use reads with primary alignment attribute to fill the gap, because there is no such reads, so the gap cannot be filled. In addition, I'm not sure if this is a error.

merlyescalona commented 1 year ago

Hi @huandna and @JieZhouFighting, thanks for using YAGcloser, even though it hasn't been a straightforward experience.

@huandna, given the log, it just looks like there is no support for any gap. There are multiple filters, and I will update the documentation with a more detailed algorithm explanation soon. Here a general explanation:

Per gap:

  1. identify the alignments that span the gap +- 40 (flank size) base pairs.
  2. extract from those alignments the read chunks that fall within those coordinates [gap_start_coordinate - flan size, gap_end_coordinate + flank size]
  3. cluster those read chunks based on length (sequence length agreement), starting with calculating the average length of all the read chunks that span the gap and gathering the read chunks that fall within 10% of the average. I do that until convergence.
  4. Make multiple sequence alignment (MSA) from the remaining read chunks
  5. Get a consensus sequence from the MSA.

The -eft parameter is related to step 1), and refers to the % of flanks populated; this ratio is calculated as the number of empty flanks / number of total flanks. Empty flanks=flanks regions where mpileup is empty for all the flank positions.
And the condition is that the empty_flanks_ration > eft