zavolanlab / MIRZAG

MIRZA-G - Pipeline and model for miRNA target prediction
5 stars 3 forks source link

Generation of .nh file with custom annotation #7

Open fgypas opened 6 years ago

fgypas commented 6 years ago

How can one generate a custom annotation a file like

MIRZAG/data/hg19_tree.nh

When I use the AlignmentExtraction (https://git.scicore.unibas.ch/SmallRNAs/AlignmentExtraction) pipeline (develop branch with snakemake) I am able to generate all other required input files except this one.

fgypas commented 6 years ago

Actually, it's available via UCSC. For example http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/hg38.100way.nh The question is now how do you trim it to keep only the required organisms?

fgypas commented 6 years ago

Temp solution:

Download the tree wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/hg38.100way.nh

Install phytools in R and run:

library("phytools")
tree <- read.tree(file="./hg38.100way.nh")
#species<-c("hg38","rheMac8","mm10","bosTau8","felCat9","galGal5","rn6")
species<-c("hg38","rheMac3","mm10","bosTau8","felCat8","galGal4","rn6")
pruned.tree<-drop.tip(tree,tree$tip.label[-match(species, tree$tip.label)])
write.tree(pruned.tree)

session info

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /scicore/soft/apps/OpenBLAS/0.2.13-GCC-4.8.4-LAPACK-3.5.0/lib/libopenblas_prescottp-r0.2.13.so

locale:
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
 [5] LC_MONETARY=en_GB.utf8    LC_MESSAGES=en_GB.utf8
 [7] LC_PAPER=en_GB.utf8       LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] phytools_0.6-60 maps_3.3.0      ape_5.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18            quadprog_1.5-5          lattice_0.20-35
 [4] MASS_7.3-50             grid_3.5.0              nlme_3.1-137
 [7] magrittr_1.5            coda_0.19-1             scatterplot3d_0.3-41
[10] phangorn_2.4.0          combinat_0.0-8          Matrix_1.2-14
[13] fastmatch_1.1-0         tools_3.5.0             igraph_1.2.1
[16] plotrix_3.7-2           numDeriv_2016.8-1       parallel_3.5.0
[19] compiler_3.5.0          pkgconfig_2.0.1         mnormt_1.5-5
[22] clusterGeneration_1.3.4 animation_2.5           expm_0.999-2

ToDo: