thackl / minidot

Fast and pretty dotplots for whole genomes assemblies using minimap and R/ggplot2
MIT License
74 stars 10 forks source link

Examples not working anymore #8

Closed ebioman closed 7 years ago

ebioman commented 7 years ago

Hello The examples you provide in the make file are not working anymore. I think it is related with the problem that Ensembl is not offering the entire genome as one file anymore (or I could not find it).



minidot make sample-arabidopsis
hash minimap || make util/minimap
mkdir -p samples/arabidopsis
curl -# ftp://ftp.ensemblgenomes.org/pub/plants/current/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.31.dna.genome.fa.gz | gunzip > samples/arabidopsis/A.thaliana.fa

curl: (78) RETR response: 550

gzip: stdin: unexpected end of file
make: *** [samples/arabidopsis/A.thaliana.fa] Error 1

minidot make sample-prochlorococcus
hash minimap || make util/minimap
mkdir -p samples/prochlorococcus
curl -# ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000011465.1_ASM1146v1/GCF_000011465.1_ASM1146v1_genomic.fna.gz | gunzip > samples/prochlorococcus/MED4.fa

curl: (9) Server denied you to change to the given directory

gzip: stdin: unexpected end of file
make: *** [samples/prochlorococcus/MED4.fa] Error 1

This is a bit troublesome as I had another error testing it on my data and did not know then whether it was my local installation or my data which caused it.


 bin/minidot -o myGenome.pdf myGenome.fasta myGenome2.fasta 

Loading required package: proto
[15:57:44] minimap .. ok
[15:57:44] samtools .. ok
[15:57:44] prepping .. done
[16:10:16] mode: fast (4787412572 bp) "-L 500 -c 15"
[16:10:16] mapping ..bin/minidot: line 186: 40375 Segmentation fault      $MBIN $MOPT $TFA $TFA >> $PAF 2>> $TML
 failed
thackl commented 7 years ago

Hi,

you are right, NCBI recently restructured it's ftp sites, and the download links in the example aren't working anymore. I will try to fix that, unfortunately I don't have the time right now.

As for the seg-fault, that is an issue with the size of your genome, and the way I currently run minimap. Minimap cannot handle sequences >2Gbp or so. And I internally concatenate all contigs of a genome into a single sequence, in your case some 4Gbp. I have plans to change that behaviour, but again, not the time to implement them at the moment. Sorry.

ebioman commented 7 years ago

Hi Thanks for the quick reply. I suspected already something related to the size of the genome. This is indeed a bummer as your approach looked very interesting and could have replaced mummer for many quick analysis. Cheers

thackl commented 7 years ago

Have a look at this repo https://github.com/zeeev/minimap, and the extension described in the example https://github.com/zeeev/minimap#running-example-gorilla-vs-grch38. This might also work for large genomes.

ebioman commented 7 years ago

Hi Thanks for pointing me to that alternative method. It does work in general but oddly annotates (calculates) genome sizes of a multiple order of the real one. Cheers