schneebergerlab / fixchr

MIT License
13 stars 1 forks source link

Reading BAM/SAM file - ERROR - CIGAR string is not present in PAF at line 2364 8479 37257382 37251209 6116 6174 95.73 1 -1 chr01 chr22. Exiting #9

Closed ShirelyI closed 8 months ago

ShirelyI commented 11 months ago

Hi Manish, when I run 'fixchr -c S.P.filter.coords -r S.genome.fa -q P.genome.fa -F P',it showed 'Reading BAM/SAM file - ERROR - CIGAR string is not present in PAF at line 2364 8479 37257382 37251209 6116 6174 95.73 1 -1 chr01 chr22. Exiting.' .The S.P.filter.coords file is the output of 'show-coords -THrd' . I'm wondering which file should be the input(alignment.paf) for fixchr and how to get it ? Best, Lily

mnshgl0110 commented 10 months ago

Hi @ShirelyI . You can use fixchr with either .paf files generated by minimap2 (using -F P) or with .coords files generated by mummer/show-coords (using -F T). Currently, the input file and the -F parameter are inconsistent.

ShirelyI commented 10 months ago

Hi @ShirelyI . You can use fixchr with either .paf files generated by minimap2 (using -F P) or with .coords files generated by mummer/show-coords (using -F T). Currently, the input file and the -F parameter are inconsistent.

Hi Manish, I tried to use 'fixchr -c ${PRE}.filter.coords -d ${PRE}.filter.delta -r $ref -q $qry -F T', it failed and showed 1b587eb5cd0ee4ece3eacc7cdc99c35 Then I tried 'fixchr -c ${PRE}.filter.delta -r $ref -q $qry -F T' ,it also failed and showed 966a650dffcea6304834fa0dd7ea546 So I used another command 'fixchr -c ${PRE}.filter.coords -r $ref -q $qry -F T', it succeed and generated 8 files, 3a62b801329670e11f4b5c0a6961a7a I found that the '_inputalignments.txt' file is the same as '_homologousalignments.txt' file, both of them were the same as my '${PRE}.filter.coords' file except the header of the file.

What troubled me is that the chromosomal ordering of my two species' genomes is in accordance with collinearity(just different strands but not chromosomes ), but the dotplot produced is not consistent with the chromosomal order. Moreover, after I ran fixchr, the chromosomal order of my query genome has changed(same as dotplot). The chromosomal order of ref genome is chr01-chr24,and qry is 0fad1f49d91c8a296d7ccfc5b7f6588 When I use '_dotplot -c homologous_strand_correctedalignments.txt -r ref.filtered.fa -q qry.filtered.fa -F T' to plot, the error is shown as follows: image Part of the homologous_strand_corrected_alignments.txt is as follows : image

In breif, I'm wondering why the order of chromosomes of qry genome changed and how to plot without wrong messages. Best, Lily

mnshgl0110 commented 10 months ago

As you wanted to run syri, you need to now align the newly generated ref.filtered.fa and qry.filtered.fa. Then use these fasta files and the newly generated alignments as input for syri. The order of chromosomes in the query genomes do not matter and will not affect the analysis.

Regarding dotplot, the homologous_strand_corrected.pdf contains the final plot. If you want to re-plot it from homologous_strand_corrected_alignments.txt then you will need to delete the first row containing the header and then use it with dotplot.

ShirelyI commented 10 months ago

As you wanted to run syri, you need to now align the newly generated ref.filtered.fa and qry.filtered.fa. Then use these fasta files and the newly generated alignments as input for syri. The order of chromosomes in the query genomes do not matter and will not affect the analysis.

Regarding dotplot, the homologous_strand_corrected.pdf contains the final plot. If you want to re-plot it from homologous_strand_corrected_alignments.txt then you will need to delete the first row containing the header and then use it with dotplot. Hi Manish, Thanks for your reply! I'll run syri after align the newly generated ref.filtered.fa and qry.filtered.fa files. The '_homologous_strandcorrected.pdf' file is as follows : image As you said, 'The order of chromosomes in the query genomes does not matter and will not affect the analysis'.Do you mean that although the order of chromosomes in the genome has changed, the collinearity is still refchr01-qrychr01,refchr02-qrychr02...refchr24-qrychr24 as my origion collinear result? if so ,why the plot shows refchr02-qrychr24,refchr03-qrychr14 etc? Besides, it seems like that maybe some of the chromosomes are remain not right strand(chr01/7/8/9/15/18/19/20/24) according to the plot? In addition, I found that the chromosomes in the drawing are not sorted in order. Do you know how to show the chromosomes from chr01 to chr24 in the plot? Best, Lily

ShirelyI commented 10 months ago

Hi Manish, I run the alignment by using ref.filtered.fa and qry.filtered.fa genomes , when run syri, it also failed for ''syri.chr23 and chr07 - ERROR - No syntenic region found for chromosome: chr23. This is potentially caused by the two assemblies having different strands for this chromosomes. Reverse complement the chromosome to ensure that the same strands are analysed. Exiting.' image I've already use fixchr to reverse complement the chromosome to ensure that the same strands,but it's a mistake again. I've troubled by this for several days.Could you please help me? Best, Lily

mnshgl0110 commented 10 months ago

I guess that this is happening because the two genomes have same chromosome IDs for non-homologous chromosomes. For example: Chr2 in reference is actually homologous to Chr24 in query, whereas the two Chr2-s are not homologs. Syri assumes that the chromosomes with the same ID are homologs but since that is not the case, it crashes. Please modify the IDs in qry.filtered.fasta to something like 'qry_Chr1', 'qry_Chr2' etc. before doing the alignment. I think that should solve this.

In case, there are more issues, could you please also share the dotplot generated with the ref.filtered.fa and the modified qry.filtered.fa.

ShirelyI commented 10 months ago

Hi Manish, Actually,two genomes are non-homologous chromosomes one-to-one at first, I get the plot after I ran nucmer,the result shows as follows: image image I realized that the two genomes contain same chromosome IDs for non-homologous chromosomes,so I adjusted the chromosomes IDs for query genome(for example:oldC07-newC02,oldC23-newC03,oldC11-newC05 etc). The plot seems better and have same chromosome IDs for homologous chromosomes(refchr01-qrychr01,refchr02-qrychr02 etc) after adjusted the chromosomes IDs : image Then I use the adjusted query genome as input to run nucmer,delta-filter,show-coords and fixchr. As you can see, the collinearity of the chromosomes has changed again(refchr02-qrychr24,refchr03-qrychr14 etc) after fixchr. image That's troubled me a lot.

I'm not clear what do you mean by 'modify the IDs in qry.filtered.fasta to something like 'qry_Chr1', 'qry_Chr2' etc. before doing the alignment',for the order of qry.filtered.fa is chr01,chr24,chr14,chr04,chr17...,do you mean I need to modify the chr24 to qry_Chr2 or modify chr02 to qry_Chr2?

As you said ,I try to plot by using the ref.filtered.fa and the qry.filtered.fa(not the modified qry.filtered.fa as you mentioned above for I'm not sure how to modify), I delete the first row containing the header of homologous_strand_corrected_alignments.txt to plot,but it also showed'ValueError: Length mismatch: Expected axis has 11 elements, new values have 12 elements' .It's hard to explain. I could send my genomes to you via email if it's convenient for you. Thanks! Best, Lily

mnshgl0110 commented 10 months ago

Hi Lily,

'modify the IDs in qry.filtered.fasta to something like 'qry_Chr1', 'qry_Chr2' etc. before doing the alignment'

I meant to only append some string in the chromosome IDs. You can do this using:

sed 's/>/>qry_/g' qry.filtered.fa > qry.filtered.new.fa

Then you can use the ref.filtered.fa and qry.filtered.new.fa with nucmer, delta-filter, show-coords and syri. I believe that this would work.

As you can see, the collinearity of the chromosomes has changed again(refchr02-qrychr24,refchr03-qrychr14 etc) after fixchr.

This new dotplot is same as the one you shared earlier, so I presume that your fixchr run used old files and not the new ones. Please re-check that, or run it in a new folder.

Best

ShirelyI commented 10 months ago

Hi Manish, I replace IDs in qry.filtered.fasta as you said. image I use the ref.filtered.fa and qry.filtered.new.fa with nucmer, delta-filter, show-coords and syri. image Then chr07 failed again image When I check mapids.txt for mapping used I found that it was mapped to qry_chr18, image Besides,many of the output files are empty image Looking forward to your reply Best, Lily

mnshgl0110 commented 10 months ago

Hi Lily, So there is some progress, that is great. Based on the dotplots, Chr7 is homologous to qry_Chr18, so the mapping is correct. These two chromosomes have large local inversions. In such scenarios, fixchr sometimes cannot determine which orientation is syntenic (in fact, even manually it is tricky to determine which strand is better match). Please retry by reverse complementing (using other strand) of qry_Chr18. Hopefully, that would work. NOTE: Similar problem could happen for Chr8, Chr9, Chr18, and Chr19. Unfortunately, this would need to be manually tested and fixed. Best Manish

ShirelyI commented 10 months ago

Hi Manish, Thanks for your patience, and now l know that the cause is the presence of large local inversions. You said 'retry by reverse complementing (using other strand) of qry_Chr18 ', do you mean that I could rerun fixchr and use the new _qry.filtered.new2.fa(for instance) and ref.filtered.fa to rerun nucmer, delta-filter, show-coords and syri? 'retry by reverse complementing (using other strand) of qry_Chr18', can I rerun fixchr specify for qry_Chr18 or when I retry fixchr it may automatically identify the qry_Chr18 ? and how could I specify other strand? Looking forward to your reply. Best, Lily

mnshgl0110 commented 10 months ago

Unfortunately, you will have to do this manually (fixchr cannot do that). I am not sure how experienced you are with scripting, but basically what you need is to create a new "qry.filtered.new_2.fa" where, compared to qry.filtered.new.fa, only the qry_Chr18 is reverse complemented. And then try rerun nucmer, delta-filter etc with it. You can probably find many solutions for doing this online. I have a small tool here which can do that, but it is not well documented or supported.