schneebergerlab / plotsr

Tool to plot synteny and structural rearrangements between genomes
MIT License
282 stars 28 forks source link

Chromosome ID not present in genome fasta #57

Closed adriandrr closed 1 year ago

adriandrr commented 1 year ago

Hey, I am currently trying to run your syntheny plot pipeline. I would be really glad if you could help me.   I am trying to create the syntheny between 12 bacteria de-novo generated whole genomes.   I mapped them with minimap A+B, B+C, C+D and so on. I used a bash script for that if you want to have a look:  

!/bin/bash

  fasta_files=(0.fa 1.fa 2.fa 3.fa 4.fa 5.fa 6.fa 7.fa 9.fa 10.fa 11.fa 12.fa 13.fa)   for ((i = 0; i < ${#fasta_files[@]} - 1; i++)); do     current_file="${fasta_files[$i]}"     next_file="${fasta_files[$i + 1]}"     current_prefix="${current_file%.}"     next_prefix="${next_file%.}"     output_bam="${currentprefix}${next_prefix}.bam"         minimap2 -ax asm5 -t 4 --eqx "$current_file" "$next_file" | samtools sort -O BAM - > "$output_bam"     samtools index "$outputbam" done   After that I used a bash oneliner for-loop to produce the syri information:   for i in $(ls bam -1v); do prefix="${i%.}";IFS="" read -r fnum snum <<< "$prefix"; syri -c $i -r $fnum.fa -q $snum.fa -F B --prefix $prefix ;done   I am unsure if there is a problem with the syri information since I am not very familiar with that. The first 5 lines of the first syri output  "0_1syri.out" look like this:   Chr0    1          4344    -           -           -           -           -           NOTAL1           -           NOTAL - Chr0    4345    5189076          -           -           Chr0    1          5020588          SYN1    -           SYN      - Chr0    4345    4896    -           -           Chr0    1          553      SYNAL1            SYN1    SYNAL  - Chr0    4484    4484    G         T          Chr0    140      140      SNP543           SYN1    SNP      - Chr0    4485    4485    A          T          Chr0    141      141      SNP544           SYN1    SNP      -   what I now tried is to start plotsr with this command:   plotsr --sr 0_1syri.out --sr 1_2syri.out --sr 2_3syri.out --sr 3_4syri.out --sr 4_5syri.out --sr 5_6syri.out --sr 6_7syri.out --sr 7_9syri.out --sr 9_10syri.out --sr 10_11syri.out --sr 11_12syri.out --sr 12_13syri.out --genomes ../../genomes2.txt -o output_plot.png   First I wanted to use the main fasta files as input whereas the genomes2.txt file looked like that:  

file     name   tags

0.fa      0          lw:1.5 1.fa      1          lw:1.5 10.fa    10        lw:1.5 11.fa    11        lw:1.5 12.fa    12        lw:1.5 13.fa    13        lw:1.5 2.fa      2          lw:1.5 3.fa      3          lw:1.5 4.fa      4          lw:1.5 5.fa      5          lw:1.5 6.fa      6          lw:1.5 7.fa      7          lw:1.5 9.fa      9          lw:1.5   and I ran into the error: ImportError: For chromosome ID: Chr0, length in genome fasta: genomes2.txt is less than the maximum coordinate in the structural annotation file: 1_2syri.out. Exiting.   I didn't understand the error. The first fasta file is the reference and therefore the longest. I don't really see maximum coordinate problems. Anyway, I saw that there was the possibility of using the chromosome lengths as input. So I calculated the length of each used fasta file and produced a chrlen file. Ofc I renamed the input files in genomes2.txt from .fa to .chrlen. The chrlen files look like this   "0.chrlen": Chr0    5199559   "1.chrlen": Chr0    5020588   and so on...   With that and the same plotsr command to start I run into the error:   ImportError: Chromosome ID: Chr0 in structural annotation file: 0_1syri.out not present in genome fasta: 0. Exiting   Could you explain to me, what I am doing wrong? Thanks!   P.S.: thanks for reading until here. I think I found an error in your example with the current explanation in the README file. The chosen fonts in the example files markers.bed and tracks.txt are Arial. I think this is not supported anymore (?). Anyway, I changed it to DejaVu Sans and it worked again. Thought you might know :)

adriandrr commented 1 year ago

Update: i ordered the genomes.txt file numerically so that it is

file name tags

0.fa 0 lw:1.5 1.fa 1 lw:1.5 2.fa 2 lw:1.5 ...

with the chrlen files I still run into the same error as before, but with the fast files it actually worked!!

mnshgl0110 commented 1 year ago

For future reference: genomes.txt requires genomes to be in same order in which they are analysed. Also, to use chrlen files, use ft:cl tag.