Open nmorf opened 3 years ago
Good morning,
This is the first time I use Sleuth after pseudoalignment with kallisto. Quite new to this. Everything runs well, except for when I try to collapse transcripts to genes with the target_mapping. I get exactly the same error as nmorf above, and I was wondering if it had been solved somewhere else. I can't seem to find an answer, and I've tried to generate all kinds of files to use this function. Here it is the code I am using, which is basically what I see in the walkthroughs and from everybody! To generate the t2g file:
mart <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl", host = 'ensembl.org') t2g <- biomaRt::getBM(attributes = c("ensembl_transcript_id","ensembl_transcript_id_version", "ensembl_gene_id", "ensembl_gene_id_version","external_gene_name","description", "chromosome_name","start_position", "end_position","strand", "entrezgene_id"), mart = mart) t2g <- dplyr::rename(t2g, target_id = ensembl_transcript_id, ens_gene = ensembl_gene_id, ext_gene = external_gene_name)
t2g <- dplyr::select(t2g, c('target_id', 'ens_gene', 'ext_gene'))
To run the sleuth_prep function:
so122 <- sleuth_prep (metadata122, target_mapping = t2g, aggregation_column = 'ens_gene', read_bootstrap_tpm = TRUE, extra_bootstrap_summary = TRUE, transformation_function = function(x) log2(x + 0.5), num_cores = 2)
The error I get all the time (no matter how I construct the t2g data.frame):
Warning: It appears that you are running Sleuth from within Rstudio. Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1. If you wish to take advantage of multiple cores, please consider running sleuth from the command line.reading in kallisto results dropping unused factor levels Error in check_target_mapping(tmp_names, target_mapping, !is.null(aggregation_column)) : couldn't solve nonzero intersection
And here I show you he first rows of our .tsv abundance file from kallisto (I use the .h5 for the sleuth_prep:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
target_id | | | | | | | | length | eff_length | est_counts | tpm -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- ENST00000456328.2 | ENSG00000223972.5 | OTTHUMG00000000961.2 | OTTHUMT00000362751.1 | DDX11L1-202 | DDX11L1 | 1657 | processed_transcript | 1657 | 1453.07 | 0 | 0 ENST00000450305.2 | ENSG00000223972.5 | OTTHUMG00000000961.2 | OTTHUMT00000002844.2 | DDX11L1-201 | DDX11L1 | 632 | transcribed_unprocessed_pseudogene | 632 | 428.3 | 0 | 0 ENST00000488147.1 | ENSG00000227232.5 | OTTHUMG00000000958.1 | OTTHUMT00000002839.1 | WASH7P-201 | WASH7P | 1351 | unprocessed_pseudogene | 1351 | 1147.07 | 0 | 0 ENST00000619216.1 | ENSG00000278267.1 | - | - | MIR6859-1-201 | MIR6859-1 | 68 | miRNA | 68 | 34.625 | 0 | 0 ENST00000473358.1 | ENSG00000243485.5 | OTTHUMG00000000959.2 | OTTHUMT00000002840.1 | MIR1302-2HG-202 | MIR1302-2HG | 712 | lncRNA | 712 | 508.07 | 0 | 0
Hello,
I'm trying to use bioMart to retrieve the gene names from Apis mellifera from Ensemble. I'm trying to analyze the data generated by Kallisto using Sleuth.
I encounter the error posted in 2017 (link below). I haven't been able to fix it myself. I was wondering if someone could direct me to a possible solution without editing the fasta files?
https://github.com/pachterlab/sleuth/issues/111
Here is the error message that I get.
Thank you, nm