Open rnaseq opened 8 years ago
To progress with the mapping, we need to upload links to the genome assemblies - we only need (hopefully) 4 links which are GRCh38 back through NCBI34
The relevant links to the genome assemblies for NCBI34/35/36, GRCh37/38 are available in a text file under datasets/dna.
Suggest changes to the links as below.
Also, we will need to filter the GTF files to only include genes/transcripts that are in our genome assembly files (particularly relevant for GRCh37 and 38)
NCBI34 (release22 - release 25 incl.) ftp://ftp.ensembl.org/pub/release-25/human-25.34e/data/fasta/dna/Homo_sapiens.NCBI34.sep.dna.chromosome.*.fa.gz + ftp://ftp.ensembl.org/pub/release-25/human-25.34e/data/fasta/dna/Homo_sapiens.NCBI34.sep.dna.contig.fa.gz
NCBI35 (release 26 - release 37 incl.) ftp://ftp.ensembl.org/pub/release-37/homo_sapiens_37_35j/data/fasta/dna/Homo_sapiens.NCBI35.feb.dna.chromosome.*.fa.gz + ftp://ftp.ensembl.org/pub/release-37/homo_sapiens_37_35j/data/fasta/dna/Homo_sapiens.0.NCBI35.feb.dna.contig.fa.gz
NCBI36 (release 38 - release 54 incl.) ftp://ftp.ensembl.org/pub/release-54/fasta/homo_sapiens/dna/Homo_sapiens.NCBI36.54.dna.toplevel.fa.gz
GRCh37 (release 55 - release 75 incl.) ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
GRCh38 (release 76 - release 83 incl.) ftp://ftp.ensembl.org/pub/release-83/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
With the latter genome builds, it is suggested to discard alt. contigs.