pachterlab / sleuth

Differential analysis of RNA-Seq
http://pachterlab.github.io/sleuth
GNU General Public License v3.0
304 stars 95 forks source link

Questions about vignette #55

Closed olgabot closed 7 years ago

olgabot commented 8 years ago

Finally able to install sleuth (had to reinstall Rcpp...), I'm trying out the vingette but getting stuck. I'm reading it from this Rmd file because I have it installed on our shared computing resource which does not have $DISPLAY defined so if I use vignette('intro', package = 'sleuth') (which I think is supposed to pop up a pdf? or something?), I can't see it.

> install.packages('dplyr')
Installing package into ‘/home/obotvinnik/R’
(as ‘lib’ is unspecified)
Warning: unable to access index for repository https://cran.rstudio.com/src/contrib
Warning message:
package ‘dplyr’ is not available (for R version 3.2.1)
blahah commented 8 years ago

The vignette command starts a webserver. Here's a PDF of the current version: https://www.dropbox.com/s/gurc0l5ikjnx5gp/Introduction%20to%20sleuth.pdf?dl=0

olgabot commented 8 years ago

ok I can make a tunnel to view the webserver

On Tue, Dec 22, 2015 at 2:13 PM Richard Smith-Unna notifications@github.com wrote:

The vignette command starts a webserver. Here's a PDF of the current version: https://www.dropbox.com/s/gurc0l5ikjnx5gp/Introduction%20to%20sleuth.pdf?dl=0

— Reply to this email directly or view it on GitHub https://github.com/pachterlab/sleuth/issues/55#issuecomment-166745404.

blahah commented 8 years ago

@olgabot can you share the command you used to do that?

pimentel commented 8 years ago

@olgabot:

@Blahah I'm not sure what @olgabot had in mind, but I usually do something like this: http://www.linuxjournal.com/content/use-ssh-create-http-proxy

Thanks, Harold

olgabot commented 8 years ago
  1. Thank you for the link! That answers a lot of my questions.

@Blahah I was going to use the same command that I used for tunneling to IPython/Jupyter notebooks on our server such that I can view them on my laptop browser, but that didn't send any data over. The -D flag suggested from @pimentel seems to be for when you know the IP/DNS address of the place you want to send it to, and since I want to view on my laptop, I don't have a static IP. Anyways, here's the command I tried:

ssh -NL 8000:localhost:8000 obotvinnik@tscc-login2.sdsc.edu &

For now, I'll use the rawgit link provided.

  1. The version of kallisto is 0.42.1:
$ /projects/ps-yeolab/software/kallisto/kallisto --version
Error: invalid command --version
kallisto 0.42.1

Usage: kallisto <CMD> [arguments] ..

Where <CMD> can be one of:

    index         Builds a kallisto index 
    quant         Runs the quantification algorithm 
    h5dump        Converts HDF5-formatted results to plaintext
    version       Prints version information

Running kallisto <CMD> without arguments prints usage information for <CMD>
  1. Yes, there's abundance.txt. Here's an example directory:
$ ll Sample_10A_Full_Set_All_Runs_Concatenated/
total 13M
-rw-r--r-- 1 obotvinnik yeo-group 15M Dec 11 17:59 abundance.h5
-rw-r--r-- 1 obotvinnik yeo-group 12M Dec 11 17:59 abundance.txt
-rw-r--r-- 1 obotvinnik yeo-group 624 Dec 11 17:59 run_info.json

And here's the contents of run_info.json:

{
        "n_targets": 81814,
        "n_bootstraps": 0,
        "kallisto_version": "0.42.1",
        "index_version": 9,
        "start_time": "Fri Dec 11 17:57:04 2015",
        "call": "/projects/ps-yeolab/software/kallisto/kallisto quant -i /projects/ps-yeolab/genomes/hg19/kallisto/gencode.v19.pc_transcripts.only_protein_coding_transcripts.fa.k31 --threads=4 -o /home/obotvinnik/projects/autism_brain_rnaseq/analysis/kallisto_concatenated/Sample_10A_Full_Set_All_Runs_Concatenated -l 320 --single /projects/ps-yeolab/seqdata/20150422_heather_all_runs_autism_brain_postmortem_data/final_sym_links_to_files/Sample_10A_Full_Set_All_Runs_Concatenated.fastq.gz"
}
  1. Nope, loading library(sleuth) didn't import dplyr, since I didn't even have it installed. But it's okay - now that I can see what the output is supposed to be like, I'll just make the table in Python/Pandas :stuck_out_tongue_winking_eye:
  2. Are the column names required to be exactly sample and condition, like path? Or is that a choice you made for the vignette? E.g. if I want to run several different analyses like ~ Diagnosis or ~ TissueType - do I need to rename them to condition ?

EDIT: didn't close fenced code blocks properly

olgabot commented 8 years ago

Okay I have a few more questions.

  1. How can you find out what is the latest version of sleuth and are we updated with it? e.g. is there a CRAN or Bioconductor badge you can add to your repo?
  2. How can you run the vingettes or sleuth_live commands headless, so the ports can be tunneled?
olgabot commented 8 years ago

for No. 2 above I'm thinking the equivalent command I use in jupyter notebook to run headless on a particular port is jupyter notebook --no-browser --port 7700. Or else we get this error on our server:

Listening on http://127.0.0.1:42427
/usr/bin/xdg-open: line 402: htmlview: command not found
/usr/bin/xdg-open: line 402: firefox: command not found
/usr/bin/xdg-open: line 402: mozilla: command not found
/usr/bin/xdg-open: line 402: netscape: command not found
/usr/bin/xdg-open: line 402: links: command not found
/usr/bin/xdg-open: line 402: lynx: command not found
xdg-open: no method available for opening 'http://127.0.0.1:42427'
blahah commented 8 years ago

@olgabot you can have any column names you want for different variables describing the datasets, and you can then use those in the linear model formula you specify.

For example, see this blog post by @vals where he adds tissue, development and day.

pimentel commented 8 years ago

@olgabot sorry for the delayed response:

How can you find out what is the latest version of sleuth and are we updated with it? e.g. is there a CRAN or Bioconductor badge you can add to your repo?

You can check the installed version using packageVersion('sleuth') and a check whether or not that matches the version in the master branch file DESCRIPTION (every proper R package has a version number there). It is not currently on CRAN or BioConductor (I'm thinking we will probably put it on CRAN when it is totally stable), but I will look into adding a badge and will also add a check upon loading the package.

for No. 2 above I'm thinking the equivalent command I use in jupyter notebook to run headless on a particular port is jupyter notebook --no-browser --port 7700. Or else we get this error on our server:

You can pass additional arguments to sleuth_live that will get passed to shiny::runApp. Anyway, it technically runs 'headless' but launches a browser. You can omit this by passing: launch.browser = FALSE. you can also change the port by passing port = someNumber.

hjeanc commented 8 years ago

Hi I am attempting to analyze the kallistos generated by Olga using Sleuth. I'm sure that this is some obvious error on my part but maybe you can spot it more quickly. Attempting to construct the sleuth object results in the following error. Error in sleuth_prep(s2c, ~condition, target_mapping = t2g) : You must generate bootstraps on all of your samples. Here are the ones that don't contain any: ~/projects/kallisto/Sample_217A_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_214_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_213_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_164_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_258_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_64_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_66_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_178_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_68_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_176_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_251_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_257_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_20_Full_Set_All_Runs_Concatenated ~/projects/kallisto/Sample_28A_Full_Set_All_Runs_Concatenated

However there are kallisto files in these directories.

lamo <- read.table("~/projects/kallisto/Sample_217A_Full_Set_All_Runs_Concatenated/abundance.txt", header=TRUE, stringsAsFactors=FALSE) lamo[1:5,] target_id 1 ENST00000335137.3|ENSG00000186092.4|OTTHUMG00000001094.1|OTTHUMT00000003223.1|OR4F5-001|OR4F5|918|CDS:1-918| 2 ENST00000423372.3|ENSG00000237683.5|-|-|AL627309.1-201|AL627309.1|2661|UTR5:1-70|CDS:71-850|UTR3:851-2661| 3 ENST00000426406.1|ENSG00000235249.1|OTTHUMG00000002860.1|OTTHUMT00000007999.1|OR4F29-001|OR4F29|995|UTR5:1-19|CDS:20-958|UTR3:959-995| 4 ENST00000332831.2|ENSG00000185097.2|OTTHUMG00000002581.1|OTTHUMT00000007334.1|OR4F16-001|OR4F16|995|UTR5:1-19|CDS:20-958|UTR3:959-995| 5 ENST00000599533.1|ENSG00000269831.1|-|-|AL669831.1-201|AL669831.1|129|CDS:1-129| length eff_length est_counts tpm 1 918 599 0.00 0.00000 2 2661 2342 13976.00 281.08400 3 995 676 48.75 3.39678 4 995 676 48.75 3.39678 5 129 129 0.00 0.00000 Thanks!

pimentel commented 8 years ago

@hjeanc this means that kallisto was not run with bootstraps. It should be run with a nonzero number of bootstraps. Please rerun it with the argument -b X where X is some number greater than or equal to 30.

Also, if you have files are labeled abundance.txt your version of kallisto is quite out of date. I strongly suggest updating it since several bugs have been fixed in the past 6 months. In particular, the effective length calculation has improved quite a bit and changes the results when transcripts are short.

olgabot commented 8 years ago

Due to compatibility issues, we're only able to install sleuth via bioconda. Last I checked we had the latest version of sleuth from there but I'll double check

On Mon, Feb 1, 2016, 09:56 Harold Pimentel notifications@github.com wrote:

@hjeanc https://github.com/hjeanc this means that kallisto was not run with bootstraps. It should be run with a nonzero number of bootstraps. Please rerun it with the argument -b X where X is some number greater than or equal to 30.

Also, if you have files labeled abundance.txt your version of kallisto is quite out of date. I strongly suggest updating it since several bugs have been fixed in the past 6 months. In particular, the effective length calculation has improved quite a bit and changes the results when transcripts are short.

— Reply to this email directly or view it on GitHub https://github.com/pachterlab/sleuth/issues/55#issuecomment-178095831.

pimentel commented 8 years ago

@olgabot kallisto is the program that is out of date. Recently we have been able to supply statically linked binaries which work on most systems.

olgabot commented 8 years ago

Oh okay. I'm in the airport now so I'll check when I get to a computer

On Mon, Feb 1, 2016, 11:01 Harold Pimentel notifications@github.com wrote:

@olgabot https://github.com/olgabot kallisto is the program that is out of date. Recently we have been able to supply statically linked binaries which work on most systems.

— Reply to this email directly or view it on GitHub https://github.com/pachterlab/sleuth/issues/55#issuecomment-178130656.

telia22 commented 8 years ago

Dear All, First of all, thank you for your attention. I am new in Sleuth. I am working on RNASeq, and I have a problem with Sleuth. I have abundance.tsv results from kallisto. I have the below Error: Error in sleuth_prep(s2c, ~condition) : You must generate bootstraps on all of your samples. Here are the ones that don't contain any: ~/Desktop/cuffdiff2_data_kallisto_results/results/tube10kallisto/abundance.tsv

any help is much appreciated Regards

raynamharris commented 8 years ago

Hi, I am also getting the error " You must generate bootstraps on all of your samples". I did use '-b 100' when I ran kallisto and the version of kallisto that I'm using gives abundance.tv files. Any thoughts on what could still be producing the error? Thanks, Rayna

telia22 commented 8 years ago

hi, you must use abundance.h5 not tsv file. dont define any output format in kallisto. after that you should use the abundance.h5 . good luck

raynamharris commented 8 years ago

Figured it out! I wasn't pointing to the right directory in all the right places. Thanks!