jonassibbesen / hprc-rnaseq-analyses-scripts

Scripts used for the RNA-seq analyses in the HPRC draft pangenome paper.
MIT License
9 stars 0 forks source link

hprc-rnaseq-analyses-scripts

Scripts used for the RNA-seq analyses presneted in the HPRC draft pangenome paper: A Draft Human Pangenome Reference, bioRxiv (2022).

The directories contain the scripts used in the analyses, as well as in many cases the corresponding log files. Note that the scripts includes hard-coded paths and file names, and therefore needs to be edited before being used in other projects. Some of the scripts have been slighly updated to reduce the number of hard-coded paths. Some of the log files include a short header which specifies the Docker image that was used. The Docker containers are available at this repository: s3script-dockerfiles. Some of the log files have been edited to remove excessive warnings in order to reduce file size. These have the suffix less_warnings.

In addition to scripts, the evaluation folder also contains the calculated mapping benchmark data and estimated expression values that was used in the benchmark.

Data

Here you can find links to the data used in the paper. This includes both raw data and data constructed as part of the analyses in the paper. The constructed data included here is data that are either not guaranteed to be reproducible (simulated reads) or that are deemed potentially useful in other projects (graphs and indexes).

Graphs, pantranscriptomes and indexes

The spliced pangenome graphs, pantranscriptomes and indexes:

Reference transcripts

The processed GENCODE (v38) transcript annotations:

Reads

The simulated RNA-seq reads:

The real RNA-seq reads:

The Iso-Seq alignments: