thackl / minidot

Fast and pretty dotplots for whole genomes assemblies using minimap and R/ggplot2
MIT License
74 stars 10 forks source link

what is the "per set sequence lengths" #11

Open smallfishcui opened 4 years ago

smallfishcui commented 4 years ago

Hi,

I am not so clear from the manual about "per set sequence lengths". Is it about the target genome or the query sequence? I am comparing two genome assembly from minicap2, and I am using the minidot.R to plot the paf result from the minimap2, but I don't have another legnth file. Is it okay to take the header from the sam file from minimap2? Is there a file format for the length file? It seems the executable minidot in the bin folder only works with minimap version 1, and it is recommended to use minicap2 now....

Thanks, Cui

thackl commented 4 years ago

Hi Cui,

it is a 3 column file gives the length of each contig of each genome:

  1. genome_id - name of genome fasta file without .fa
  2. contig_id
  3. contig_length

I usually create it using samtools faidx, and then just by adding the genome_id in the first column.

If you run bin/minimap it will do that automatically. I should work with minimap2 too, but I never had the time to really test it. Hope that helps!

smallfishcui commented 4 years ago

Hi Thomas,

Thank you for your quick and clear explanation. I have one more question, should I follow the order that target file should be placed in the first half of the file, and query genome in the second half of the file. Or the order doesn't matter? I generated the paf file using minimap2, would this file compatible? I couldn't find the installation instruction from the minicap website, so I am not sure if i could run it at all....

Thanks, Cui

thackl commented 4 years ago

The order shouldn't matter - I think it is just joined by genome_id. minimap2 paf files should work too. But let me know if they don't.

smallfishcui commented 4 years ago

Hi,

I made a length file as instructed, and a paf file generated in minimap2, and using minicot.R script to construct the dot, but got an error message like this: Error in Math.factor(x$V2) : ‘cumsum’ not meaningful for factors Calls: cbind -> lapply -> FUN -> Math.factor Execution halted

Do you know what could be the reason? Or maybe there is something wrong with my steps?

thanks, Cui

jdamas13 commented 1 year ago

@smallfishcui were you able to fix this problem? I am in the same situation. Thanks!