Open ashishdamania opened 9 years ago
Is it possible to retrieve the old files from EBI ?
I tried looking at their ftp site and best I could find was this: ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/release/std/fasta/ which I assume what we need. Correction: May be this is the correct file: ftp://ftp.ebi.ac.uk/pub/databases/fastafiles/emblcds/emblcds.gz
I tried retrieving sequences from ftp://ftp.ebi.ac.uk/pub/databases/fastafiles/emblcds/emblcds.gz and they are formatted as shown below
>EMBLCDS:BAJ49870 BAJ49870.1 Candidatus Caldiarchaeum subterraneum archaeal cell division control protein 6
So the issue is just with the FTP address and rest of your script still holds fine.
So the line 43 in /Build-Input-Files-for-Gene-Ontology / Main.sh should be changed from ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/cds.fasta.gz to ftp://ftp.ebi.ac.uk/pub/databases/fastafiles/emblcds/emblcds.gz
Also in the line 20 , should be changed in the Main.sh script : from wget ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gene_association.goa_uniprot.gz to wget ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/goa_uniprot_all.gaf.gz
and instead of gunzip gene_association.goa_uniprot.gz to gunzip goa_uniprot_all.gaf.gz
Embl seems to have change their file structure http://www.ebi.ac.uk/about/news/service-news/change-cds-ftp-products so the script Main.sh does not work as intended. Not exactly sure about this one: Rebuild-Fasta.py gives out of range error probably because the script anticipates ":" for the sequences but now it has "|" in the sequences.