Closed wanghao1991217 closed 3 years ago
Hi,
For the first error, I suggest you disable the dbsnp and COSMIC options (just set them with FALSE). And the second error may be caused by the incomplete annotations of PrepareAnnotationEnsembl2.
Hi,
For the first error, I suggest you disable the dbsnp and COSMIC options (just set them with FALSE). And the second error may be caused by the incomplete annotations of PrepareAnnotationEnsembl2.
ensembl <- useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl",
host="aug2020.archive.ensembl.org", path="/biomart/martservice",
archive=FALSE)
PrepareAnnotationEnsembl2(mart = ensembl, "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/ensembl_annotation",
splice_matrix = T,
dbsnp = NULL, COSMIC = FALSE)
Prepare gene/transcript/protein id mapping information (ids.RData) ... done
Build TranscriptDB object (txdb.sqlite) ...
Ensembl site unresponsive, trying uswest mirror
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... FAILED! (=> skipped)
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
done
Prepare exon annotation information (exon_anno.RData) ... done
Prepare protein coding sequence (procodingseq.RData)... done
Prepare protein sequence (proseq.RData) ... done
Prepare exon splice information (splicemax.RData) ... done
Warning messages:
1: In .infer_chrominfo_from_transcripts_and_splicings(transcripts$tx_chrom, :
chromosome lengths and circularity flags are not available for this TxDb object
2: closing unused connection 3 ()
dbCreator(gtfFile = gtf, vcfFile = vcf, tabFile = tab,
annotation_path = "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/ensembl_annotation",
outfile_name = "dbcreator_ensembl", outdir = "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/out",
genome=Hsapiens)
Output abberant protein FASTA file caused by short INDEL... done
Output variation table and variant protein sequence caused bySNVs... Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'unique': error in evaluating the argument 'x' in selecting a method for function 'substr': error in evaluating the argument 'x' in selecting a method for function 'translate': input must be a single non-NA string
In addition: There were 19 warnings (use warnings() to see them)
traceback()
19: h(simpleError(msg, call))
18: .handleSimpleError(function (cond)
.Internal(C_tryCatchHelper(addr, 1L, cond)), "error in evaluating the argument 'x' in selecting a method for function 'substr': error in evaluating the argument 'x' in selecting a method for function 'translate': input must be a single non-NA string",
base::quote(h(simpleError(msg, call))))
17: h(simpleError(msg, call))
16: .handleSimpleError(function (cond)
.Internal(C_tryCatchHelper(addr, 1L, cond)), "error in evaluating the argument 'x' in selecting a method for function 'translate': input must be a single non-NA string",
base::quote(h(simpleError(msg, call))))
15: h(simpleError(msg, call))
14: .handleSimpleError(function (cond)
.Internal(C_tryCatchHelper(addr, 1L, cond)), "input must be a single non-NA string",
base::quote(.charToXString(seqtype, x, start, end, width)))
13: stop(wmsg("input must be a single non-NA string"))
12: .charToXString(seqtype, x, start, end, width)
11: XString("DNA", x, start = start, width = nchar)
10: XString("DNA", x, start = start, width = nchar)
9: DNAString(paste0("ATG", x))
8: translate(DNAString(paste0("ATG", x)))
7: substr(as.character(translate(DNAString(paste0("ATG", x)))),
2, 2)
6: unique(substr(as.character(translate(DNAString(paste0("ATG",
x)))), 2, 2))
5: FUN(X[[i]], ...)
4: lapply(vcodesDNAS, function(x) unique(substr(as.character(translate(DNAString(paste0("ATG",
x)))), 2, 2)))
3: lapply(vcodesDNAS, function(x) unique(substr(as.character(translate(DNAString(paste0("ATG",
x)))), 2, 2)))
2: aaVariation(postable_snv, codingseq)
1: dbCreator(gtfFile = gtf, vcfFile = vcf, tabFile = tab, annotation_path = "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/ensembl_annotation",
outfile_name = "dbcreator_ensembl", outdir = "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/out",
genome = Hsapiens)
The error of "FAILED to get the chrominfo" was caused by some bugs of dependent package. We have resolved this by adding some modified functions. Please install the latest PGA packakge from github and try once again.
ensembl <- useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl",
+ host="aug2020.archive.ensembl.org", path="/biomart/martservice",
+ archive=FALSE)
> PrepareAnnotationEnsembl2(mart = ensembl, "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/ensembl_annotation",
+ splice_matrix = T,
+ dbsnp = NULL, COSMIC = FALSE)
Prepare gene/transcript/protein id mapping information (ids.RData) ... done
Build TranscriptDB object (txdb.sqlite) ...
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
done
Prepare exon annotation information (exon_anno.RData) ... done
Prepare protein coding sequence (procodingseq.RData)... done
Prepare protein sequence (proseq.RData) ... done
Prepare exon splice information (splicemax.RData) ... done
> dbCreator(gtfFile = gtf, vcfFile = vcf, tabFile = tab,
+ annotation_path = "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/ensembl_annotation",
+ outfile_name = "dbcreator_ensembl", outdir = "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/out",
+ genome=Hsapiens)
Output abberant protein FASTA file caused by short INDEL... done
Output variation table and variant protein sequence caused bySNVs... Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'unique': error in evaluating the argument 'x' in selecting a method for function 'substr': error in evaluating the argument 'x' in selecting a method for function 'translate': input must be a single non-NA string
In addition: There were 19 warnings (use warnings() to see them)
> traceback()
19: h(simpleError(msg, call))
18: .handleSimpleError(function (cond)
.Internal(C_tryCatchHelper(addr, 1L, cond)), "error in evaluating the argument 'x' in selecting a method for function 'substr': error in evaluating the argument 'x' in selecting a method for function 'translate': input must be a single non-NA string",
base::quote(h(simpleError(msg, call))))
17: h(simpleError(msg, call))
16: .handleSimpleError(function (cond)
.Internal(C_tryCatchHelper(addr, 1L, cond)), "error in evaluating the argument 'x' in selecting a method for function 'translate': input must be a single non-NA string",
base::quote(h(simpleError(msg, call))))
15: h(simpleError(msg, call))
14: .handleSimpleError(function (cond)
.Internal(C_tryCatchHelper(addr, 1L, cond)), "input must be a single non-NA string",
base::quote(.charToXString(seqtype, x, start, end, width)))
13: stop(wmsg("input must be a single non-NA string"))
12: .charToXString(seqtype, x, start, end, width)
11: XString("DNA", x, start = start, width = nchar)
10: XString("DNA", x, start = start, width = nchar)
9: DNAString(paste0("ATG", x))
8: translate(DNAString(paste0("ATG", x)))
7: substr(as.character(translate(DNAString(paste0("ATG", x)))),
2, 2)
6: unique(substr(as.character(translate(DNAString(paste0("ATG",
x)))), 2, 2))
5: FUN(X[[i]], ...)
4: lapply(vcodesDNAS, function(x) unique(substr(as.character(translate(DNAString(paste0("ATG",
x)))), 2, 2)))
3: lapply(vcodesDNAS, function(x) unique(substr(as.character(translate(DNAString(paste0("ATG",
x)))), 2, 2)))
2: aaVariation(postable_snv, codingseq)
1: dbCreator(gtfFile = gtf, vcfFile = vcf, tabFile = tab, annotation_path = "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/ensembl_annotation",
outfile_name = "dbcreator_ensembl", outdir = "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/out",
genome = Hsapiens)
Could you please provide a demo vcf file and all the annotations in the ensembl_annotation/ for me to debug? You can send it to me by xsh.skye@gmail.com.
I have updated the PGA to 1.15.3, re-prepare the ensembl annotation and run dbCreator.
ensembl <- useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl", + host="aug2020.archive.ensembl.org", path="/biomart/martservice", + archive=FALSE) > PrepareAnnotationEnsembl2(mart = ensembl, "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/ensembl_annotation", + splice_matrix = T, + dbsnp = NULL, COSMIC = FALSE) Prepare gene/transcript/protein id mapping information (ids.RData) ... done Build TranscriptDB object (txdb.sqlite) ... Download and preprocess the 'transcripts' data frame ... OK Download and preprocess the 'chrominfo' data frame ... OK Download and preprocess the 'splicings' data frame ... OK Download and preprocess the 'genes' data frame ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK done Prepare exon annotation information (exon_anno.RData) ... done Prepare protein coding sequence (procodingseq.RData)... done Prepare protein sequence (proseq.RData) ... done Prepare exon splice information (splicemax.RData) ... done > dbCreator(gtfFile = gtf, vcfFile = vcf, tabFile = tab, + annotation_path = "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/ensembl_annotation", + outfile_name = "dbcreator_ensembl", outdir = "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/out", + genome=Hsapiens) Output abberant protein FASTA file caused by short INDEL... done Output variation table and variant protein sequence caused bySNVs... Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'unique': error in evaluating the argument 'x' in selecting a method for function 'substr': error in evaluating the argument 'x' in selecting a method for function 'translate': input must be a single non-NA string In addition: There were 19 warnings (use warnings() to see them) > traceback() 19: h(simpleError(msg, call)) 18: .handleSimpleError(function (cond) .Internal(C_tryCatchHelper(addr, 1L, cond)), "error in evaluating the argument 'x' in selecting a method for function 'substr': error in evaluating the argument 'x' in selecting a method for function 'translate': input must be a single non-NA string", base::quote(h(simpleError(msg, call)))) 17: h(simpleError(msg, call)) 16: .handleSimpleError(function (cond) .Internal(C_tryCatchHelper(addr, 1L, cond)), "error in evaluating the argument 'x' in selecting a method for function 'translate': input must be a single non-NA string", base::quote(h(simpleError(msg, call)))) 15: h(simpleError(msg, call)) 14: .handleSimpleError(function (cond) .Internal(C_tryCatchHelper(addr, 1L, cond)), "input must be a single non-NA string", base::quote(.charToXString(seqtype, x, start, end, width))) 13: stop(wmsg("input must be a single non-NA string")) 12: .charToXString(seqtype, x, start, end, width) 11: XString("DNA", x, start = start, width = nchar) 10: XString("DNA", x, start = start, width = nchar) 9: DNAString(paste0("ATG", x)) 8: translate(DNAString(paste0("ATG", x))) 7: substr(as.character(translate(DNAString(paste0("ATG", x)))), 2, 2) 6: unique(substr(as.character(translate(DNAString(paste0("ATG", x)))), 2, 2)) 5: FUN(X[[i]], ...) 4: lapply(vcodesDNAS, function(x) unique(substr(as.character(translate(DNAString(paste0("ATG", x)))), 2, 2))) 3: lapply(vcodesDNAS, function(x) unique(substr(as.character(translate(DNAString(paste0("ATG", x)))), 2, 2))) 2: aaVariation(postable_snv, codingseq) 1: dbCreator(gtfFile = gtf, vcfFile = vcf, tabFile = tab, annotation_path = "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/ensembl_annotation", outfile_name = "dbcreator_ensembl", outdir = "/media/rchenubuntu/5618DC8B18DC6B8D/KIRC_R/PGA/out", genome = Hsapiens)
Hi,
This error was caused by the multiple alternative alleles in ALT column.
We suggest to divide it to two lines by modifying vcf file in advance.
Please convert this format from
to
The perl command line for your information:
cat old.vcf |perl -F'\t' -lane 'if(/^#/){print;next};if( $F[4] =~/,/){my @arr=split /,/,$F[4]; map{$F[4]=$_; print join("\t",@F)}@arr}else{print }'>new.vcf
And we have fixed another minor bug. Please install the latest PGA packakge (version 1.15.4) from github and try once again.
@Shawn-Xu The problem is solved. Thank you! However, I found the txFinder.fasta file which has been just generated from one sample is very large to more than 400 mb. And I checked the file manually and found many duplicate entries with same header and same amino acid sequence.
@Shawn-Xu The problem is solved. Thank you! However, I found the txFinder.fasta file which has been just generated from one sample is very large to more than 400 mb. And I checked the file manually and found many duplicate entries with same header and same amino acid sequence.
I have resolved this problem in the latest PGA packakge (version 1.15.4). Alternatively, you could remove these redundant entries by yourself.
@Shawn-Xu Thank you for help.
Please reopen this issue if you still have questions about it.
I use the PrepareAnnotationEnsembl2 function to Preparing annotation files with Ensembl release 101. There's error with enable the snp.
Then, I skip this error and run dbCreator.
I wonder if this error is related to the failed construction of SNP database with the PrepareAnnotationEnsembl2 function. Here is my workflow of analysis of vcf, transcript.gtf, and junction.tab.