Closed ShirelyI closed 8 months ago
Hi @ShirelyI . You can use fixchr
with either .paf files generated by minimap2 (using -F P
) or with .coords files generated by mummer/show-coords (using -F T
). Currently, the input file and the -F
parameter are inconsistent.
Hi @ShirelyI . You can use
fixchr
with either .paf files generated by minimap2 (using-F P
) or with .coords files generated by mummer/show-coords (using-F T
). Currently, the input file and the-F
parameter are inconsistent.
Hi Manish, I tried to use 'fixchr -c ${PRE}.filter.coords -d ${PRE}.filter.delta -r $ref -q $qry -F T', it failed and showed Then I tried 'fixchr -c ${PRE}.filter.delta -r $ref -q $qry -F T' ,it also failed and showed So I used another command 'fixchr -c ${PRE}.filter.coords -r $ref -q $qry -F T', it succeed and generated 8 files, I found that the '_inputalignments.txt' file is the same as '_homologousalignments.txt' file, both of them were the same as my '${PRE}.filter.coords' file except the header of the file.
What troubled me is that the chromosomal ordering of my two species' genomes is in accordance with collinearity(just different strands but not chromosomes ), but the dotplot produced is not consistent with the chromosomal order. Moreover, after I ran fixchr, the chromosomal order of my query genome has changed(same as dotplot). The chromosomal order of ref genome is chr01-chr24,and qry is When I use '_dotplot -c homologous_strand_correctedalignments.txt -r ref.filtered.fa -q qry.filtered.fa -F T' to plot, the error is shown as follows: Part of the homologous_strand_corrected_alignments.txt is as follows :
In breif, I'm wondering why the order of chromosomes of qry genome changed and how to plot without wrong messages. Best, Lily
As you wanted to run syri, you need to now align the newly generated ref.filtered.fa
and qry.filtered.fa
. Then use these fasta files and the newly generated alignments as input for syri. The order of chromosomes in the query genomes do not matter and will not affect the analysis.
Regarding dotplot, the homologous_strand_corrected.pdf
contains the final plot. If you want to re-plot it from homologous_strand_corrected_alignments.txt
then you will need to delete the first row containing the header and then use it with dotplot
.
As you wanted to run syri, you need to now align the newly generated
ref.filtered.fa
andqry.filtered.fa
. Then use these fasta files and the newly generated alignments as input for syri. The order of chromosomes in the query genomes do not matter and will not affect the analysis.Regarding dotplot, the
homologous_strand_corrected.pdf
contains the final plot. If you want to re-plot it fromhomologous_strand_corrected_alignments.txt
then you will need to delete the first row containing the header and then use it withdotplot
. Hi Manish, Thanks for your reply! I'll run syri after align the newly generated ref.filtered.fa and qry.filtered.fa files. The '_homologous_strandcorrected.pdf' file is as follows : As you said, 'The order of chromosomes in the query genomes does not matter and will not affect the analysis'.Do you mean that although the order of chromosomes in the genome has changed, the collinearity is still refchr01-qrychr01,refchr02-qrychr02...refchr24-qrychr24 as my origion collinear result? if so ,why the plot shows refchr02-qrychr24,refchr03-qrychr14 etc? Besides, it seems like that maybe some of the chromosomes are remain not right strand(chr01/7/8/9/15/18/19/20/24) according to the plot? In addition, I found that the chromosomes in the drawing are not sorted in order. Do you know how to show the chromosomes from chr01 to chr24 in the plot? Best, Lily
Hi Manish, I run the alignment by using ref.filtered.fa and qry.filtered.fa genomes , when run syri, it also failed for ''syri.chr23 and chr07 - ERROR - No syntenic region found for chromosome: chr23. This is potentially caused by the two assemblies having different strands for this chromosomes. Reverse complement the chromosome to ensure that the same strands are analysed. Exiting.' I've already use fixchr to reverse complement the chromosome to ensure that the same strands,but it's a mistake again. I've troubled by this for several days.Could you please help me? Best, Lily
I guess that this is happening because the two genomes have same chromosome IDs for non-homologous chromosomes. For example: Chr2 in reference is actually homologous to Chr24 in query, whereas the two Chr2-s are not homologs. Syri assumes that the chromosomes with the same ID are homologs but since that is not the case, it crashes. Please modify the IDs in qry.filtered.fasta to something like 'qry_Chr1', 'qry_Chr2' etc. before doing the alignment. I think that should solve this.
In case, there are more issues, could you please also share the dotplot generated with the ref.filtered.fa and the modified qry.filtered.fa.
Hi Manish, Actually,two genomes are non-homologous chromosomes one-to-one at first, I get the plot after I ran nucmer,the result shows as follows: I realized that the two genomes contain same chromosome IDs for non-homologous chromosomes,so I adjusted the chromosomes IDs for query genome(for example:oldC07-newC02,oldC23-newC03,oldC11-newC05 etc). The plot seems better and have same chromosome IDs for homologous chromosomes(refchr01-qrychr01,refchr02-qrychr02 etc) after adjusted the chromosomes IDs : Then I use the adjusted query genome as input to run nucmer,delta-filter,show-coords and fixchr. As you can see, the collinearity of the chromosomes has changed again(refchr02-qrychr24,refchr03-qrychr14 etc) after fixchr. That's troubled me a lot.
I'm not clear what do you mean by 'modify the IDs in qry.filtered.fasta to something like 'qry_Chr1', 'qry_Chr2' etc. before doing the alignment',for the order of qry.filtered.fa is chr01,chr24,chr14,chr04,chr17...,do you mean I need to modify the chr24 to qry_Chr2 or modify chr02 to qry_Chr2?
As you said ,I try to plot by using the ref.filtered.fa and the qry.filtered.fa(not the modified qry.filtered.fa as you mentioned above for I'm not sure how to modify), I delete the first row containing the header of homologous_strand_corrected_alignments.txt to plot,but it also showed'ValueError: Length mismatch: Expected axis has 11 elements, new values have 12 elements' .It's hard to explain. I could send my genomes to you via email if it's convenient for you. Thanks! Best, Lily
Hi Lily,
'modify the IDs in qry.filtered.fasta to something like 'qry_Chr1', 'qry_Chr2' etc. before doing the alignment'
I meant to only append some string in the chromosome IDs. You can do this using:
sed 's/>/>qry_/g' qry.filtered.fa > qry.filtered.new.fa
Then you can use the ref.filtered.fa and qry.filtered.new.fa with nucmer, delta-filter, show-coords and syri. I believe that this would work.
As you can see, the collinearity of the chromosomes has changed again(refchr02-qrychr24,refchr03-qrychr14 etc) after fixchr.
This new dotplot is same as the one you shared earlier, so I presume that your fixchr
run used old files and not the new ones. Please re-check that, or run it in a new folder.
Best
Hi Manish, I replace IDs in qry.filtered.fasta as you said. I use the ref.filtered.fa and qry.filtered.new.fa with nucmer, delta-filter, show-coords and syri. Then chr07 failed again When I check mapids.txt for mapping used I found that it was mapped to qry_chr18, Besides,many of the output files are empty Looking forward to your reply Best, Lily
Hi Lily, So there is some progress, that is great. Based on the dotplots, Chr7 is homologous to qry_Chr18, so the mapping is correct. These two chromosomes have large local inversions. In such scenarios, fixchr sometimes cannot determine which orientation is syntenic (in fact, even manually it is tricky to determine which strand is better match). Please retry by reverse complementing (using other strand) of qry_Chr18. Hopefully, that would work. NOTE: Similar problem could happen for Chr8, Chr9, Chr18, and Chr19. Unfortunately, this would need to be manually tested and fixed. Best Manish
Hi Manish, Thanks for your patience, and now l know that the cause is the presence of large local inversions. You said 'retry by reverse complementing (using other strand) of qry_Chr18 ', do you mean that I could rerun fixchr and use the new _qry.filtered.new2.fa(for instance) and ref.filtered.fa to rerun nucmer, delta-filter, show-coords and syri? 'retry by reverse complementing (using other strand) of qry_Chr18', can I rerun fixchr specify for qry_Chr18 or when I retry fixchr it may automatically identify the qry_Chr18 ? and how could I specify other strand? Looking forward to your reply. Best, Lily
Unfortunately, you will have to do this manually (fixchr cannot do that). I am not sure how experienced you are with scripting, but basically what you need is to create a new "qry.filtered.new_2.fa" where, compared to qry.filtered.new.fa, only the qry_Chr18 is reverse complemented. And then try rerun nucmer, delta-filter etc with it. You can probably find many solutions for doing this online. I have a small tool here which can do that, but it is not well documented or supported.
Hi Manish, when I run 'fixchr -c S.P.filter.coords -r S.genome.fa -q P.genome.fa -F P',it showed 'Reading BAM/SAM file - ERROR - CIGAR string is not present in PAF at line 2364 8479 37257382 37251209 6116 6174 95.73 1 -1 chr01 chr22. Exiting.' .The S.P.filter.coords file is the output of 'show-coords -THrd' . I'm wondering which file should be the input(alignment.paf) for fixchr and how to get it ? Best, Lily