pcingola / SnpEff

Other
236 stars 76 forks source link

mm39 database download issue #536

Open arundurvasula opened 2 months ago

arundurvasula commented 2 months ago

Describe the bug Unable to download mm39

To Reproduce

  1. SnpEff version: SnpEff version SnpEff 5.2c (build 2024-04-09 12:24)
  2. Genome version: mm39
  3. SnpEff full command line: java -Xmx60g -jar ~/software/snpEff/snpEff.jar mm39 data/vcf/$ID.vcf.gz
  4. Output / Error message: FATAL ERROR: Failed to download database from [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_mm39.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_mm39.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_mm39.zip]

Expected behavior SnpEff should download the database.

Data N/A (reproducible with java -Xmx60g -jar ~/software/snpEff/snpEff.jar download mm39:

FATAL ERROR: Failed to download database from [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_mm39.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_mm39.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_mm39.zip]

Additional context Issue is potentially related to #374, but not sure there is an applicable solution.

carsonbroeker commented 2 months ago

I had the same issue. I ended up having to redo part of my pipeline and realign to mm10, which I was able to download with snpEff. It looks like the latest sourceforge availability for databases is from 2018, before mm39 was released: https://sourceforge.net/projects/snpeff/files/databases/. Doesn't look like you can manually get mm39 from here and install it yourself.

arundurvasula commented 1 month ago

Thanks for the comment. It would be great to have the database for mm39 given how much newer the genome is. Hopefully this is just a matter of reuploading existing files.

maxmilianr commented 2 weeks ago

Hi, i still get the same error today. Did anyone find a good solution/workaround to this or was able to build the db manually?

arundurvasula commented 2 weeks ago

I haven’t found a solution or workaround yet. Hoping this is just a database update upstream that will fix the issue as soon as the developer can get to it.

On Tue, Jun 18, 2024 at 12:44 AM Maximilian Radtke @.***> wrote:

Hi, i still get the same error today. Did anyone find a good solution/workaround to this or was able to build the db manually?

— Reply to this email directly, view it on GitHub https://github.com/pcingola/SnpEff/issues/536#issuecomment-2175421369, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU2ZRVVEAADKGOWKYFATG3ZH7QO3AVCNFSM6AAAAABHJQ3SZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZVGQZDCMZWHE . You are receiving this because you authored the thread.Message ID: @.***>

lukedow commented 1 week ago

I had the same issue with mm39, but was eventually able to build the db manually. I tried a few of the options in the snpEff documentation, but the one that ultimately worked was the .gtf approach.

Most of what you need to know is in the documentation, but not always super clear. Here were the key points for me:

  1. Make sure you retrieve all of the required FASTA and GTF (and/or GFF) files from the same genome build, e.g. from UCSC (https://useast.ensembl.org/Mus_musculus/Info/Index). Unzip them into a specific directory (snpEff/data/mm39/) so the script can find them, and rename the files:

Mus_musculus.GRCm39.dna.primary_assembly.fa.gz: sequences.fa Mus_musculus.GRCm39.cds.all.fa.gz: cds.fa Mus_musculus.GRCm39.pep.all.fa.gz: protein.fa Mus_musculus.GRCm39.112.gtf.gz: genes.gtf

  1. Modify the snpEff.config file to include the line: mm39.genome : Mouse

  2. Build the database: java -Xmx4g -jar snpEff.jar build -gtf22 -v mm39

This should create and save the .bin files required for annotation into /snpEff/data/mm39/
Should be good to go!