Closed DrHeisenberg89 closed 7 years ago
Hi @DrHeisenberg89,
Your sleuth_prep
step completed successfully, as only warnings were listed. The important part is that sleuth detected that there was no direct overlap between the target_id
of your kallisto results, and the target_id
of your target mapping. This frequently happens because the annotations from Ensembl and Gencode include the "version number" with the transcript ID.
For example, the transcript ID you gave, ENSMUST00000148678
, is listed as ENSMUST00000148678.2
on the Ensembl website. ENSMUST00000148678
=/= ENSMUST00000148678.2
, so merging the target mapping annotations would result in no overlap, and that's what causes the warning. Sleuth automatically tries to correct this common problem, and reports it to you as a warning in case this leads to unexpected behavior downstream.
To avoid the message on subsequent runs, I would recommend you include the version number like this:
mart <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "mmusculus_gene_ensembl", host = 'ensembl.org')
t2g <- biomaRt::getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id",
"external_gene_name", "transcript_version"), mart = mart)
t2g <- dplyr::mutate(t2g, target_id = paste(ensembl_transcript_id, transcript_version, sep = "."))
t2g <- dplyr::select(t2g, target_id, ens_gene = ensembl_gene_id, ext_gene = external_gene_name)
so <- sleuth_prep(s2c, target_mapping = t2g)
if you run head(t2g)
after running the above, you should see the following:
target_id ens_gene ext_gene
1 ENSMUST00000082423.1 ENSMUSG00000064372 mt-Tp
2 ENSMUST00000082422.1 ENSMUSG00000064371 mt-Tt
3 ENSMUST00000082421.1 ENSMUSG00000064370 mt-Cytb
4 ENSMUST00000082420.1 ENSMUSG00000064369 mt-Te
5 ENSMUST00000082419.1 ENSMUSG00000064368 mt-Nd6
6 ENSMUST00000082418.1 ENSMUSG00000064367 mt-Nd5
To run the gene-level aggregation and following the tutorial, you can run the following line:
so_gene <- sleuth_prep(s2c, target_mapping = t2g, aggregation_column = "ens_gene")
As this is expected behavior for sleuth_prep
, this is not an issue or bug. Thus, I'm going to close this. If you have further problems with this, feel free to reopen this issue or submit a question to the kallisto-sleuth Google Group (link)
Hi everyone, I am a bit confused about a painful step of my pipeline using Sleuth. I am trying to move from the transcript level to the gene level and that's why I am using biomaRt (?).
I can't understand the problem because I am pretty sure I have already created a table with the specific gene names and they match!
Help?(!)
PS: please pardon my ignorance in term of bioinformatic jargon