wenbostar / PGA

PGA: a tool for ProteoGenomics Analysis
http://wenbostar.github.io/PGA/
7 stars 10 forks source link

Download reference data from UCSC for RefSeq #8

Open wenbostar opened 5 years ago

wenbostar commented 5 years ago

The CDS and protein data were downloaded from UCSC on the same day with running the following code that had the following warning message:

library(PGA)
annotation_path <- tempdir()
pepfasta <- "~/Downloads/hg19_refGenePro.fa"
CDSfasta <- "~/Downloads/hg19_refGeneCDS.fa"
PrepareAnnotationRefseq2(genome='hg19', CDSfasta, pepfasta, annotation_path,
                         dbsnp=NULL, splice_matrix=FALSE, COSMIC=FALSE)
Build TranscriptDB object (txdb.sqlite) ... 
Download the refGene table ... OK
Download the hgFixed.refLink table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
 done
Prepare gene/transcript/protein id mapping information (ids.RData) ...  done
Prepare exon annotation information (exon_anno.RData) ...  done
Prepare protein sequence (proseq.RData) ...  done
Prepare protein coding sequence (procodingseq.RData)...  done
Warning message:
In .extractCdsLocsFromUCSCTxTable(ucsc_txtable) :
  UCSC data anomaly in 433 transcript(s): the cds cumulative length is not a multiple of 3
  for transcripts ‘NM_033425’ ‘NM_006510’ ‘NM_001146344’ ‘NM_001010890’ ‘NM_001300891’
  ‘NM_001300891’ ‘NM_017940’ ‘NM_002537’ ‘NM_003954’ ‘NM_006510’ ‘NM_001278563’
  ‘NM_001291815’ ‘NM_001359231’ ‘NM_001354658’ ‘NM_001350198’ ‘NM_001243042’
  ‘NM_001243042’ ‘NM_002570’ ‘NM_001128590’ ‘NM_001271870’ ‘NM_001271872’ ‘NM_001329984’
  ‘NM_001037501’ ‘NM_001037675’ ‘NM_001277444’ ‘NM_001351365’ ‘NM_001297654’
  ‘NM_001288952’ ‘NM_001134939’ ‘NM_001301371’ ‘NM_153334’ ‘NM_001348286’ ‘NM_001348208’
  ‘NM_001348208’ ‘NM_001348208’ ‘NM_001348208’ ‘NM_001348208’ ‘NM_001289152’ ‘NM_199349’
  ‘NM_138324’ ‘NM_138323’ ‘NM_138322’ ‘NM_138319’ ‘NM_005671’ ‘NM_001143962’ ‘NM_000500’
  ‘NM_145171’ ‘NM_001318833’ ‘NM_006904� [... truncated]
sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Amazon Linux AMI 2018.03

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] PGA_1.13.3           rTANDEM_1.22.1       Rcpp_1.0.1
 [4] XML_3.98-1.20        data.table_1.12.2    Biostrings_2.50.2
 [7] XVector_0.22.0       GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
[10] IRanges_2.16.0       S4Vectors_0.20.1     BiocGenerics_0.28.0

loaded via a namespace (and not attached):
 [1] Biobase_2.42.0              httr_1.4.0
 [3] bit64_0.9-7                 assertthat_0.2.1
 [5] BiocManager_1.30.4          blob_1.1.1
 [7] BSgenome_1.50.0             GenomeInfoDbData_1.2.0
 [9] Rsamtools_1.34.1            remotes_2.0.4
[11] progress_1.2.2              pillar_1.4.1
[13] RSQLite_2.1.1               lattice_0.20-38
[15] glue_1.3.1                  digest_0.6.19
[17] RColorBrewer_1.1-2          colorspace_1.4-1
[19] Matrix_1.2-17               plyr_1.8.4
[21] pkgconfig_2.0.2             pheatmap_1.0.12
[23] customProDB_1.22.1          biomaRt_2.38.0
[25] zlibbioc_1.28.0             purrr_0.3.2
[27] scales_1.0.0                processx_3.3.1
[29] BiocParallel_1.16.6         tibble_2.1.3
[31] ggplot2_3.2.0               AhoCorasickTrie_0.1.0
[33] SummarizedExperiment_1.12.0 GenomicFeatures_1.34.8
[35] lazyeval_0.2.2              magrittr_1.5
[37] crayon_1.3.4                memoise_1.1.0
[39] ps_1.3.0                    MASS_7.3-51.4
[41] RMariaDB_1.0.6.9000         tools_3.5.3
[43] prettyunits_1.0.2           hms_0.4.2
[45] matrixStats_0.54.0          stringr_1.4.0
[47] munsell_0.5.0               DelayedArray_0.8.0
[49] AnnotationDbi_1.44.0        ade4_1.7-13
[51] compiler_3.5.3              rlang_0.3.4
[53] grid_3.5.3                  RCurl_1.95-4.12
[55] VariantAnnotation_1.28.13   bitops_1.0-6
[57] gtable_0.3.0                curl_3.3
[59] DBI_1.0.0.9001              R6_2.4.0
[61] GenomicAlignments_1.18.1    Nozzle.R1_1.1-1
[63] dplyr_0.8.1                 rtracklayer_1.42.2
[65] seqinr_3.4-5                bit_1.1-14
[67] readr_1.3.1                 stringi_1.4.3
[69] tidyselect_0.2.5