Open YPGG1234 opened 5 years ago
Hi there,
I am happy to give a detailed explanation. First, can you tell me what the grouping confidence score is? That will be in the "groupings" folder.
Thanks
Hi, this contig's grouping confidence score is 0.9568822757353695.
Thanks for sharing this. I would say that the grouping and orientation scores look pretty good. The location score is low, though that is perhaps the least descriptive since it is based on alignment coordinates with respect to the reference.
In general, it is not optimal to use a reference which is missing chromosomes (in your case, Y). In that case, as long as a contig has a >10kbp alignment anywhere else in the genome, that is where it will get placed. Is it possible to use a reference with the Y chromosome?
If not, perhaps the next best thing to do would be to increase the specificity by requiring a minimum alignment length that is much longer than 10kbp. Though I would like to add this functionality at some point in the future, it is not currently available.
However, RaGOO will not try to remake alignment files if they are already present in the output directories. So you can filter those alignment files (for example, only include alignments > 50kbp) and place them in the output directories. If they have the same names as they do now, RaGOO will not recreate them. If that doesn't make sense, I can give a more detailed example.
also, please see the preprint for a better description of the confidence scores:
Thanks for your help, I will try it. This contig length is longer than 5Mb, but be broken at position 236K. And the first part is placed on chr13 ,second part is placed on chrX.
I have another question, you say if a contig has a >10kbp alignment anywhere else in the genome, that is where it will get placed , does this mean if a contig has lots of repeat contents (such as from sex chromosomes) , then it perhaps be wrong assembled to other chromosomes (or another sex chromosome) ? And maybe occured in many places in the final fasta files ?
No that is not what I mean. Allow me to clarify.
By default, each contig is placed exactly once, unaltered, in the final ragoo.fasta file. So the final file represents just an ordered and oriented version of the input contig set.
Beyond that, one can correct misassemblies as you have, but that just breaks contigs in certain places. So If a repetitive contig has many alignments, ragoo will pick the "best" alignment to use. However, that is exactly the sort of thing that would make the confidence scores go down.
Ok, I understand. Thanks for your answer !
No problem. I will respond again to this issue when I have made the alignment length a tunable parameter.
Hi malonge,
Recently I meet some new problems.When I used assembly‘s scaffolds and reference genome to draw CIRCOS,It looks pretty,but when I used Ragoo assembly and reference to draw CIRCOS. It looks even messy.
Here it's my ragoo's command:
ragoo.py -R raw.corrected.fasta -m /bin/minimap2 -gff stringtie.generated.gff3 -T corr -t 28 -i 0.8 -j Y.candidate.txt assembly.fa ref.fna
For the previous one,I used lastal to generate link.txt, for the last one,I used minimap to generate link.txt. I am not sure it's my ragoo assembly has some problems or it's just my alignment tools has some problems.
Can you help me?Thanks.
Hi there,
Can you tell me what exactly is in the link.txt
file?
Personally, I think a dotplot would be the best visualization here. You can use mummerplot or assemblytics.
OK, link.txt is an input file required by CIRCOS to draw collinearity graph.It records the collinearity relation between assembly and reference, and the format is as follows:
QueryChr/ScaffoldName QueryChr/ScaffoldStart QueryChr/ScaffoldEnd RefChr RefChrStart RefChrEnd Scaffold_1 0 100000 Chr2 50000 150000
It can generated from lastal , minimap2 and such alignment tools.
It sounds like you used 2 different aligners to generate the plots. Can you show me what they look like if you use minimap2 for both of them? Also, what does your minimap2 command look like?
RaGOO scaffolds strictly based on minimap2 alignments, so it doesn't make sense that they would disagree that much.
My contigs_against_ref.paf.log contain this minimap2 command:
minimap2 -k19 -w19 -t24 ref.fa assembly.fa
So my minimap2 command is :
minimap2 -k19 -w19 -t 24 --secondary=no -cx asm10 ref.fa assembly.fa
I think it is possible that I opened the parameter "assembly correction", which led to the scaffold being broken.But when I ran RaGOO without any parameters, the results I drew still didn't change. My colleague told me lastal may better than minimap2 in this case,I will try it.
What organism is this? And what is the expected genome size/ploidy?
The organism is sheep and expected genome size is 2.6-2.7G just like the reference genome
Well I am puzzled because those two minimap2 commands should give very similar results. And I don't see why minimap2 would not work just fine on this genome.
One thing you can do is replace the original contigs_against_ref.paf
with the PAF file used to generate the circos plot. Let's say you have circos.paf
. You can do the following.
cd ragoo_output
mv contigs_against_ref.paf contigs_against_ref.paf.old
cp /path/to/circos/circos.paf .
mv circos.paf contigs_against_ref.paf
Then, remove every other file/directory in ragoo_output
except those paf files (you can keep the log file around too). Finally, rerun ragoo.
Ragoo will use your circos alignments for scaffolding instead of generating its own alignments.
Of course, you would have to rerun minimap2 on the original scaffolds rather than the ragoo pseudomolecules
I wonder if I can modify the RaGOO's built-in minimap2 parameter, where should I change it? Such as I want to change built-in "minimap2 -k19 -w19 -t24 ref.fa assembly.fa" to "minimap2 -k19 -w19 -t 24 --secondary=no -cx asm10 ref.fa assembly.fa".
Well you can fork the repo and change it in the source code by all means, but I was just suggesting how to run whatever minimap2 command you want, save it to a paf file, then just plug that paf file into ragoo. Ragoo won't make a new paf file if it already sees one there.
Ok, I will try it, thank you.
Hi there,
RagTag, the successor to RaGOO, is now available here:
https://github.com/malonge/RagTag
This feature is implemented in RagTag, and will likely not ever be implemented in RaGOO, which will eventually be deprecated.
Thanks
Hello, I see you have confidence scores associated with the grouping, localization, and orientation for each contig, and I want to know more details about it. For example, I have a contig in the final fasta file, and I get it's location confidence scores = 0.03104861142651071 ,and It's orientation confidence scores = 0.9638021314266446 (This contig I think it should belong to chr Y <ref dosen't have chr Y> and should not be broken,but it is assembled to chr X, and It is broken ), I want to know these scores are good or bad? And I want to know how can I judge what scores are reliable? Here it's my command:
ragoo.py -R raw.corrected.fasta -C -m /bin/minimap2 -gff stringtie.generated.gff3 -T corr -t 28 assembly.fa ref.fna
If you can help me,I will be very grateful to you.