Closed kedduck closed 2 years ago
Hi,
This seems to be the Perl issue for regular expressions on very long strings (https://www.nntp.perl.org/group/perl.perl5.porters/2014/10/msg221751.html).
And, you couldn't directly use the seq.FINAL.fasta
as input for ALLHiC, because it is a chromosome-scale assembly.
This command can directly output the 3DDNA corrected assembly:
seqtk cutN -n 100 seq.FINAL.fasta | seqkit replace -p "^(\\S+)\\s?" -r 'tig{nr}' --nr-width 7 > tig.HiCcorrected.fasta
Thanks for your kindly reply!
Dear Developer:
Thanks for your answer. When I continue using ALL-HiC, in the partation part that most contig is in group 1. I think it may be slight different between release3DDNA.pl and your command.
The perl script split the seq in every "N" and the command split the seq in "N" > 100.
Am I right? Could you please provide some advise or a new script(although it may be unreasonable)?
Hi,
By default, 3DDNA adds 100Ns between contigs as gap. Therefore I think split the seq in every "N" or split the seq in "N" >100, probably won't make a great difference to the result.
This problem is common in assembly a complex plant polyploidy genome. There are many possible reasons, such as too short contig N50, chimeric contigs and so on.
There is the pipeline I personally use to assemble complex genome:
Thanks for your great answer! It’s very helpful for me!
Dear tangerzhang
Thanks for your tools about assembly of polyploid.
Recently I am doing assembly of a autotetraploid genome, and I follow the workflow #15. After running 3D-DNA, I run the script release3DDNA.pl , it reports "Substitution loop at release3DDNA.pl line 12, chunk 2."
Does it due to the long sequence?
I read the previous issue and find the function about this script #8. Could I directly use the seq.FINAL.fasta as the input of ALL-HiC?
Expect for your reply
Thanks!