mahulchak / quickmerge

A simple and fast metassembler and assembly gap filler designed for long molecule based assemblies.
GNU General Public License v3.0
200 stars 31 forks source link

Is there a history of merges not included in anchor_summary.txt? #65

Open mu-bio opened 3 years ago

mu-bio commented 3 years ago

Hello @mahulchak ,

I checked anchor_summary.txt to see which contigs were merged. Then I found that the history merging a contig was not included in anchor_summary.txt, even though the contig length was increased by the merge. May the file not contain a history of merged contigs?

The details are as follows:

The contig name is "S_5". The line containing S_5 in the anchor_summary.txt is:

$ cat anchor_summary.txt | awk -F "\t" '$1=="S_5" || $2=="S_5" {print}'
S_5 C_4554  635863  9935    1   9935    1   9935

The S_5 length should not increase because C_4554 contains it. However, according to the output fasta file, S_5 length increased. (S_5 length increased to 764,575 bp from 635,863 bp)

cat merged.fasta | awk '$1~">" {h=$1}; $1!~">" && h==">S_5" {print h": "length($1)}'
>S_5: 764575

Why did S_5 length increase after merging?

Incidentally, the line containing S_5 in the aln_summary.tsv is:

$ cat aln_summary.tsv | awk -F "\t" '$1=="S_5" || $2=="S_5" {print}'
S_5 C_1 635863  753307  11297   481279  1   469983
S_5 C_1 635863  753307  481349  635863  470171  624685
S_5 C_13    635863  419977  9797    11437   350194  351834
S_5 C_4554  635863  9935    1   9935    1   9935

Thanks, Mimi

mahulchak commented 3 years ago

This could potentially be a bug. I'll need to check what happened. Will you be able to share your files (fasta & delta) ?

On Tue, Jun 22, 2021, 01:47 Mimi @.***> wrote:

Hello @mahulchak https://github.com/mahulchak ,

I checked anchor_summary.txt to see which contigs were merged. Then I found that the history merging a contig was not included in anchor_summary.txt, even though the contig length was increased by the merge. May the file not contain a history of merged contigs?

The details are as follows:

The contig name is "S_5". The line containing S_5 in the anchor_summary.txt is:

$ cat anchor_summary.txt | awk -F "\t" '$1=="S_5" || $2=="S_5" {print}' S_5 C_4554 635863 9935 1 9935 1 9935

The S_5 length should not increase because C_4554 contains it. However, according to the output fasta file, S_5 length increased. (S_5 length increased to 764,575 bp from 635,863 bp)

cat merged.fasta | awk '$1~">" {h=$1}; $1!~">" && h==">S_5" {print h": "length($1)}'

S_5: 764575

Why did S_5 length increase after merging?

Incidentally, the line containing S_5 in the aln_summary.tsv is:

$ cat aln_summary.tsv | awk -F "\t" '$1=="S_5" || $2=="S_5" {print}' S_5 C_1 635863 753307 11297 481279 1 469983 S_5 C_1 635863 753307 481349 635863 470171 624685 S_5 C_13 635863 419977 9797 11437 350194 351834 S_5 C_4554 635863 9935 1 9935 1 9935

Thanks, Mimi

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/65, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZQH2EKYMDWLUOSW5ZUA33TUBEZ5ANCNFSM47DGYHVQ .