thelovelab / tximeta

Transcript quantification import with automatic metadata detection
https://thelovelab.github.io/tximeta/
64 stars 11 forks source link

lexical error returned when running makeLinkedTxome #21

Closed wzmin closed 4 years ago

wzmin commented 4 years ago

Hi Mike,

CSHL 2019 participant here! I'm trying to import my quants from Salmon 1.0.0 using whole genome decoys generated on my machine but tximeta couldn't recognize the mouse Gencode M23 that i'm using. I tried to generated my own Txome but got the following error:

makeLinkedTxome(indexDir="/Users/wangz2/Documents/Lab/Computational_resources/salmon_index", source="Gencode", organism="Mus musculus",release="M23", genome="GRCm38", fasta=fastaFTP, gtf=gtfFTP, write=T)

Error: lexical error: invalid char in json text. /Users/wangz2/Documents/Lab/Comp (right here) ------^

This code use to work for me back in August. Any insight into why this is happening?

Thank you, Zhongmin

mikelove commented 4 years ago

Hmm @rob-p i thought that we solved the issue with the decoys breaking the hash? How does the hash get stored in v1.0.0?

The other issue (linkedTxome) is resolved by upgrading jsonlite I believe.

wzmin commented 4 years ago

I just tried upgrading jsonlite and the issue persists. Are we talking about jsonlite 1.6? The same error was reproduced on a different machine.

mikelove commented 4 years ago

Oh also, are you using the current tximeta release, 1.4? Can you post sessionInfo()?

wzmin commented 4 years ago

sessionInfo() R version 3.6.1 (2019-07-05) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.6

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] tximeta_1.2.2

loaded via a namespace (and not attached): [1] Rcpp_1.0.2 lattice_0.20-38 prettyunits_1.0.2 Rsamtools_2.0.3
[5] Biostrings_2.52.0 assertthat_0.2.1 zeallot_0.1.0 digest_0.6.22
[9] BiocFileCache_1.8.0 R6_2.4.0 GenomeInfoDb_1.20.0 backports_1.1.5
[13] stats4_3.6.1 RSQLite_2.1.2 httr_1.4.1 pillar_1.4.2
[17] zlibbioc_1.30.0 rlang_0.4.1 GenomicFeatures_1.36.4 progress_1.2.2
[21] lazyeval_0.2.2 curl_4.2 rstudioapi_0.10 blob_1.2.0
[25] S4Vectors_0.22.1 Matrix_1.2-17 BiocParallel_1.18.1 stringr_1.4.0
[29] ProtGenerics_1.16.0 RCurl_1.95-4.12 bit_1.1-14 biomaRt_2.40.5
[33] DelayedArray_0.10.0 compiler_3.6.1 rtracklayer_1.44.4 pkgconfig_2.0.3
[37] BiocGenerics_0.30.0 tximport_1.12.3 tidyselect_0.2.5 SummarizedExperiment_1.14.1 [41] tibble_2.1.3 GenomeInfoDbData_1.2.1 IRanges_2.18.3 matrixStats_0.55.0
[45] XML_3.98-1.20 crayon_1.3.4 dplyr_0.8.3 dbplyr_1.4.2
[49] GenomicAlignments_1.20.1 bitops_1.0-6 rappdirs_0.3.1 grid_3.6.1
[53] jsonlite_1.6 DBI_1.0.0 AnnotationFilter_1.8.0 magrittr_1.5
[57] stringi_1.4.3 XVector_0.24.0 vctrs_0.2.0 ensembldb_2.8.1
[61] tools_3.6.1 bit64_0.9-7 Biobase_2.44.0 glue_1.3.1
[65] purrr_0.3.3 hms_0.5.2 parallel_3.6.1 yaml_2.2.0
[69] AnnotationDbi_1.46.1 BiocManager_1.30.9 GenomicRanges_1.36.1 memoise_1.1.0

mikelove commented 4 years ago

Can you upgrade to the release first?

Also some jsonlite debugging

https://github.com/jeroen/jsonlite/issues/230

rob-p commented 4 years ago

@mikelove yes, all of the info should be there. @wzmin , can you please post the contents of aux_info/meta_info.json from one of your quant directories?

wzmin commented 4 years ago

@rob-p { "salmon_version": "1.0.0", "samp_type": "none", "opt_type": "vb", "quant_errors": [], "num_libraries": 1, "library_types": [ "IU" ], "frag_dist_length": 1001, "seq_bias_correct": false, "gc_bias_correct": false, "num_bias_bins": 4096, "mapping_type": "mapping", "num_valid_targets": 140748, "num_decoy_targets": 66, "num_eq_classes": 555909, "serialized_eq_classes": false, "eq_class_properties": [ "range_factorized" ], "length_classes": [ 485, 755, 1421, 2773, 101674 ], "index_seq_hash": "6f92253eb7397009ce667653d94538bc1f0bd85fad71e5c45a7395f6cfe07ffe", "index_name_hash": "3de11815d63c55b9242945c8f5ae3500f5b452804c2ffcb5611b83a079ddfb25", "index_seq_hash512": "", "index_name_hash512": "", "index_decoy_seq_hash": "44c0a33f20575470b02707e1ff7f85c8e361b4361b0171a31b130acf1de2c375", "index_decoy_name_hash": "400b996fcf8292decf9f2d7f7f7c2fd08bf6ca188ba0568de00ca4aeb207747b", "num_bootstraps": 0, "num_processed": 33991528, "num_mapped": 30149906, "num_decoy_fragments": 516520, "num_dovetail_fragments": 174621, "num_fragments_filtered_vm": 682108, "num_alignments_below_threshold_for_mapped_fragments_vm": 17732333, "percent_mapped": 88.6982956459033, "call": "quant", "start_time": "Fri Nov 8 21:41:43 2019", "end_time": "Fri Nov 8 21:44:43 2019" }

wzmin commented 4 years ago

@mikelove interestingly when i try to upgrade with both biocmanager::install or install.packages the 1.2.2 release was installed. > BiocManager::install("tximeta") Bioconductor version 3.9 (BiocManager 1.30.9), R 3.6.1 (2019-07-05) Installing package(s) 'tximeta' trying URL 'https://bioconductor.org/packages/3.9/bioc/bin/macosx/el-capitan/contrib/3.6/tximeta_1.2.2.tgz' Content type 'application/x-gzip' length 278913 bytes (272 KB)

downloaded 272 KB

The downloaded binary packages are in /var/folders/0z/k7rhty7s5v32zhtbxlvsh71w3y8z23/T//RtmpFOZQDw/downloaded_packages

rob-p commented 4 years ago

@mikelove looks like the hash entries for 512 are missing, right? That's odd.

mikelove commented 4 years ago

@rob-p my bad everything is fine 6f9... is in the hash table as it should be. False alarm

@wzmin everything should work out of the box but you need to update R first to get latest Bioc (often the case). These are linked. See Bioc install instructions for details.

wzmin commented 4 years ago

@mikelove I had to uninstall everything related to R on my computer to get the latest versions to install properly and now tximeta is recognizing the transcriptome. Sorry for the unnecessary confusion

rob-p commented 4 years ago

@mikelove --- what I mean, is that the 512-bit index seems to be missing from the meta_info.json. That information is present in the index, but looks like it is not properly propagated to meta_info.json. I just pushed a fix in develop. Are we currently using the SHA512 hash, or the SHA256?

rob-p commented 4 years ago

I think we use the 256 right now (so this should not be the cause of the issue), but now the 512 will again be properly propagated in upstream as of the next release --- the omission of this was a regression.

mikelove commented 4 years ago

Got it. Yes we use 256 now. And likely GA4GH will require a new one, but we haven’t settled yet on a spec. Thanks all.