rnaseq / degraTrans

RNA-seq performances in the context of degraded transcriptome annotations.
0 stars 0 forks source link

Genome builds: filter. #2

Open rnaseq opened 8 years ago

rnaseq commented 8 years ago

With the latter genome builds, it is suggested to discard alt. contigs.

mw55309 commented 8 years ago

To progress with the mapping, we need to upload links to the genome assemblies - we only need (hopefully) 4 links which are GRCh38 back through NCBI34

rnaseq commented 8 years ago

The relevant links to the genome assemblies for NCBI34/35/36, GRCh37/38 are available in a text file under datasets/dna.

mw55309 commented 8 years ago

Suggest changes to the links as below.

Also, we will need to filter the GTF files to only include genes/transcripts that are in our genome assembly files (particularly relevant for GRCh37 and 38)

NCBI34 (release22 - release 25 incl.) ftp://ftp.ensembl.org/pub/release-25/human-25.34e/data/fasta/dna/Homo_sapiens.NCBI34.sep.dna.chromosome.*.fa.gz + ftp://ftp.ensembl.org/pub/release-25/human-25.34e/data/fasta/dna/Homo_sapiens.NCBI34.sep.dna.contig.fa.gz

NCBI35 (release 26 - release 37 incl.) ftp://ftp.ensembl.org/pub/release-37/homo_sapiens_37_35j/data/fasta/dna/Homo_sapiens.NCBI35.feb.dna.chromosome.*.fa.gz + ftp://ftp.ensembl.org/pub/release-37/homo_sapiens_37_35j/data/fasta/dna/Homo_sapiens.0.NCBI35.feb.dna.contig.fa.gz

NCBI36 (release 38 - release 54 incl.) ftp://ftp.ensembl.org/pub/release-54/fasta/homo_sapiens/dna/Homo_sapiens.NCBI36.54.dna.toplevel.fa.gz

GRCh37 (release 55 - release 75 incl.) ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

GRCh38 (release 76 - release 83 incl.) ftp://ftp.ensembl.org/pub/release-83/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz