nf-core / epitopeprediction

A bioinformatics best-practice analysis pipeline for epitope prediction and annotation
https://nf-co.re/epitopeprediction
MIT License
42 stars 22 forks source link

Ensemble archive GRCh38 is out of date (apr2018) #208

Closed tlitfin closed 5 months ago

tlitfin commented 1 year ago

Description of the bug

It seems that Ensembl archives are maintained for 5 years which means that the apr2018 version for GRCh38 is no longer available.

Command used and terminal output

Run: 
nextflow run nf-core/epitopeprediction --input samplesheet.csv -profile docker --genome_version GRCh38 --outdir test/ --tools netmhcpan-4.1,mhcflurry --netmhcpan_path /path/to/netMHCpan-4.1b.Linux.tar.gz --fasta_output

Traceback:
line 825, in get_protein_ids_for_transcripts
urllib.request.urlopen(biomart_url + urllib.parse.quote(rq_n)).read().decode("utf-").splitlines()
...
urllib.error.HTTPError: HTTP Error 502: Bad Gateway

Relevant Line:
references = {"GRCh37": "http://feb2014.archive.ensembl.org", "GRCh38": "http://apr2018.archive.ensembl.org"}

Relevant files

No response

System information

No response

christopher-mohr commented 1 year ago

Hi @tlitfin, thanks for reporting!

Yes, this is an annoying issue and we are currently working on the possibility to provide the Ensembl (BioMart) version via parameter.

Not sure how they decide when to retire an archive since the GRCh37 one is still available.

jonasscheid commented 1 year ago

Maybe dependent on this issue:

tests for grch38 run into

 INFO:__main__:Running epaa for variants...
  Traceback (most recent call last):
    File "/home/runner/work/epitopeprediction/epitopeprediction/bin/epaa.py", line 1579, in <module>
      __main__()
    File "/home/runner/work/epitopeprediction/epitopeprediction/bin/epaa.py", line 1334, in __main__
      ID_SYSTEM_USED, transcripts, references[args.reference], args.reference
    File "/home/runner/work/epitopeprediction/epitopeprediction/bin/epaa.py", line 837, in get_protein_ids_for_transcripts
      if dic[key] in result:
  KeyError: 'Transcript stable ID'
christopher-mohr commented 11 months ago

Hi @tlitfin, sorry for the long wait. This issue should be fixed by PR #213. You can test it with the current dev version.

chagas98 commented 8 months ago

Not sure how they decide when to retire an archive since the GRCh37 one is still available.

I ran the example and have to change to the GRCh38 from http://jul2023.archive.ensembl.org. I got several errors when trying to get data from GRCh37 with https://feb2014.archive.ensembl.org.

 nextflow run nf-core/epitopeprediction -r 2.2.1 -profile test --outdir FirstRun
christopher-mohr commented 8 months ago

Not sure how they decide when to retire an archive since the GRCh37 one is still available.

I ran the example and have to change to the GRCh38 from http://jul2023.archive.ensembl.org. I got several errors when trying to get data from GRCh37 with https://feb2014.archive.ensembl.org.

 nextflow run nf-core/epitopeprediction -r 2.2.1 -profile test --outdir FirstRun

Hi @chagas98, thanks for reporting. Would you need the February 2014 archive specifically? If so, please create a separate issue and provide the corresponding error messages. I am not sure if we can still support this version but we can have a look if you specifically need this one.

chagas98 commented 8 months ago

No need. Just reporting. I think the GRCh37 version has been hosted by https://grch37.ensembl.org/index.html. As I only testing some stuff from the pipeline, I did not dive into it. Thanks @christopher-mohr !

christopher-mohr commented 5 months ago

Fixed / workaround in #213.