adriandrr commented 1 year ago

Hey, I am currently trying to run your syntheny plot pipeline. I would be really glad if you could help me. I am trying to create the syntheny between 12 bacteria de-novo generated whole genomes. I mapped them with minimap A+B, B+C, C+D and so on. I used a bash script for that if you want to have a look:

!/bin/bash

fasta_files=(0.fa 1.fa 2.fa 3.fa 4.fa 5.fa 6.fa 7.fa 9.fa 10.fa 11.fa 12.fa 13.fa) for ((i = 0; i < ${#fasta_files[@]} - 1; i++)); do current_file="${fasta_files[$i]}" next_file="${fasta_files[$i + 1]}" current_prefix="${current_file%.}" next_prefix="${next_file%.}" output_bam="${currentprefix}${next_prefix}.bam" minimap2 -ax asm5 -t 4 --eqx "$current_file" "$next_file" | samtools sort -O BAM - > "$output_bam" samtools index "$outputbam" done After that I used a bash oneliner for-loop to produce the syri information: for i in $(ls bam -1v); do prefix="${i%.}";IFS="" read -r fnum snum <<< "$prefix"; syri -c $i -r $fnum.fa -q $snum.fa -F B --prefix $prefix ;done I am unsure if there is a problem with the syri information since I am not very familiar with that. The first 5 lines of the first syri output "0_1syri.out" look like this: Chr0 1 4344 - - - - - NOTAL1 - NOTAL - Chr0 4345 5189076 - - Chr0 1 5020588 SYN1 - SYN - Chr0 4345 4896 - - Chr0 1 553 SYNAL1 SYN1 SYNAL - Chr0 4484 4484 G T Chr0 140 140 SNP543 SYN1 SNP - Chr0 4485 4485 A T Chr0 141 141 SNP544 SYN1 SNP - what I now tried is to start plotsr with this command: plotsr --sr 0_1syri.out --sr 1_2syri.out --sr 2_3syri.out --sr 3_4syri.out --sr 4_5syri.out --sr 5_6syri.out --sr 6_7syri.out --sr 7_9syri.out --sr 9_10syri.out --sr 10_11syri.out --sr 11_12syri.out --sr 12_13syri.out --genomes ../../genomes2.txt -o output_plot.png First I wanted to use the main fasta files as input whereas the genomes2.txt file looked like that:

file name tags

0.fa 0 lw:1.5 1.fa 1 lw:1.5 10.fa 10 lw:1.5 11.fa 11 lw:1.5 12.fa 12 lw:1.5 13.fa 13 lw:1.5 2.fa 2 lw:1.5 3.fa 3 lw:1.5 4.fa 4 lw:1.5 5.fa 5 lw:1.5 6.fa 6 lw:1.5 7.fa 7 lw:1.5 9.fa 9 lw:1.5 and I ran into the error: ImportError: For chromosome ID: Chr0, length in genome fasta: genomes2.txt is less than the maximum coordinate in the structural annotation file: 1_2syri.out. Exiting. I didn't understand the error. The first fasta file is the reference and therefore the longest. I don't really see maximum coordinate problems. Anyway, I saw that there was the possibility of using the chromosome lengths as input. So I calculated the length of each used fasta file and produced a chrlen file. Ofc I renamed the input files in genomes2.txt from .fa to .chrlen. The chrlen files look like this "0.chrlen": Chr0 5199559 "1.chrlen": Chr0 5020588 and so on... With that and the same plotsr command to start I run into the error: ImportError: Chromosome ID: Chr0 in structural annotation file: 0_1syri.out not present in genome fasta: 0. Exiting Could you explain to me, what I am doing wrong? Thanks! P.S.: thanks for reading until here. I think I found an error in your example with the current explanation in the README file. The chosen fonts in the example files markers.bed and tracks.txt are Arial. I think this is not supported anymore (?). Anyway, I changed it to DejaVu Sans and it worked again. Thought you might know :)

adriandrr commented 1 year ago

Update: i ordered the genomes.txt file numerically so that it is

file name tags

0.fa 0 lw:1.5 1.fa 1 lw:1.5 2.fa 2 lw:1.5 ...

with the chrlen files I still run into the same error as before, but with the fast files it actually worked!!

mnshgl0110 commented 1 year ago

For future reference: genomes.txt requires genomes to be in same order in which they are analysed. Also, to use chrlen files, use ft:cl tag.

schneebergerlab / plotsr

Chromosome ID not present in genome fasta #57

!/bin/bash

file name tags

file name tags