kaizhang / Taiji

This project has been moved to:
https://github.com/Taiji-pipeline/Taiji
9 stars 3 forks source link

RNAseq output folder empty, did it not use my RNAseq data? #3

Closed npklein closed 7 years ago

npklein commented 7 years ago

Hi @kaizhang, after fixing the BAM chromosoom names the rest of the pipeline ran without an error, however the Network and Rank files are empty. The RNA_Seq output dir is also empty, so I'm thinking it might be because the input.yml RNAseq part is not done correct, but it looks like the example input file: https://gist.github.com/npklein/dd3acaad067fbf96ba03e46dd7d97c9a.

Also, in the logs it does say that it is doing something with the RNAseq

[LOG][07-04 10:59] Initialization: Finished. [LOG][07-04 10:59] Get_ATAC_data: running... [LOG][07-04 10:59] Get_RNA_data: running... [LOG][07-04 10:59] ATAC_alignment_prepare: running... [LOG][07-04 10:59] Get_ATAC_data: Finished. [LOG][07-04 10:59] ATAC_alignment_prepare: Finished. [LOG][07-04 10:59] ATAC_alignment: running... [LOG][07-04 10:59] ATAC_alignment: Finished. [LOG][07-04 10:59] ATAC_bam_filt_prepare: running... [LOG][07-04 10:59] Get_RNA_data: Finished. [LOG][07-04 10:59] RNA_alignment_prepare: running... [LOG][07-04 10:59] ATAC_bam_filt: running... [LOG][07-04 10:59] ATAC_bam_filt_prepare: Finished. [LOG][07-04 10:59] RNA_alignment_prepare: Finished. [LOG][07-04 10:59] RNA_alignment: running... [LOG][07-04 10:59] RNA_alignment: Finished. [LOG][07-04 10:59] RNA_quantification: running... [LOG][07-04 10:59] RNA_quantification: Finished. [LOG][07-04 10:59] RNA_convert_ID_to_name: running... [LOG][07-04 10:59] RNA_convert_ID_to_name: Finished. [LOG][07-04 10:59] RNA_average_prepare: running... [LOG][07-04 10:59] RNA_average_prepare: Finished. [LOG][07-04 10:59] RNA_average: running... [LOG][07-04 10:59] RNA_average: Finished. [LOG][07-04 10:59] Output_expression: running... [LOG][07-04 10:59] Output_expression: Finished.

[LOG][07-04 12:53] ATAC_bam_filt: Finished. [LOG][07-04 12:53] ATAC_remove_dups: running...

Also, is there a way to rerun only part of the pipeline? I can remove sciflow.db, but then it also reruns ATAC-seq MarkDuplicates and peak calling, which takes quite long. Would like to test RNAseq only part of the pipeline (if that is the problem)

kaizhang commented 7 years ago

@npklein It is absolutely possible to rerun only certain steps. But this feature hasn't been documented. To rerun selected steps:

  1. Look at the workflow and decide which steps you want to rerun: taiji view. The output is DOT code. You can use dot program to convert it to a graph or just open a text editor to view it. This is an example for the latest version of taiji: https://github.com/kaizhang/Taiji/blob/master/Taiji.png.
  2. Delete cache from the database using: taiji rm [STEP_NAME]. For example, taiji rm RNA_alignment_prepare. In your case, you need to delete all steps related to RNA-seq.
  3. Rerun the program. If you want to focus on particular steps, you can use: taiji run --config config.yml --select RNA_alignment_prepare,RNA_alignment. This will only execute the specified steps and their dependencies.

Back to your problem, the program did not analyze the RNA-seq data because BAM file is currently not supported for RNA-seq analysis (see this table). You have two options:

  1. Use FASTQ as input.
  2. Analyze the BAM file by yourself and use the gene quantification as input.

If you decide to change input, you need to rerun "Initialization": taiji rm Initialization && taiji run --config config.yml --select Initialization.

npklein commented 7 years ago

@kaizhang That's great, thanks

npklein commented 7 years ago

@kaizhang I updated my input.yml to use RNAseq instead, and uncommented the STAR config lines in config.yml. I removed the RNA steps with

./taiji-Linux-x86_64-static rm Initialization ./taiji-Linux-x86_64-static rm Output_network ./taiji-Linux-x86_64-static rm PageRank ./taiji-Linux-x86_64-static rm RNA_average ./taiji-Linux-x86_64-static rm RNA_average_prepare ./taiji-Linux-x86_64-static rm RNA_convert_ID_to_name ./taiji-Linux-x86_64-static rm RNA_quantification ./taiji-Linux-x86_64-static rm RNA_alignment ./taiji-Linux-x86_64-static rm RNA_alignment_prepare ./taiji-Linux-x86_64-static rm "Get RNA-seq data"

Still, the RNA steps only take a few seconds to run (see below, log of steps is all in same minute), and the RNA_seq folder in output/ is empty. Is there a --debug type of option that shows which commands it is running at each step?

Also, when I try to run single steps, e.g. Initialization, I get an error

[umcg-ndeklein@calculon 18:23:15 TCC_clones_DHSseq]$ ./taiji-Linux-x86_64-static run --config config.yml --select Initialization Invalid option `--select'

Usage: taiji-Linux-x86_64-static COMMAND

step log

[LOG][07-05 18:20] Initialization: running... Sequence index exists. Skipped. BWA index exists. Skipped. STAR index directory exists. Skipped. RSEM index directory exists. Skipped. [LOG][07-05 18:20] Initialization: Finished. [LOG][07-05 18:20] RNA_alignment_prepare: running... [LOG][07-05 18:20] RNA_alignment_prepare: Finished. [LOG][07-05 18:20] RNA_alignment: running... [LOG][07-05 18:20] RNA_alignment: Finished. [LOG][07-05 18:20] RNA_quantification: running... [LOG][07-05 18:20] RNA_quantification: Finished. [LOG][07-05 18:20] RNA_convert_ID_to_name: running... [LOG][07-05 18:20] RNA_convert_ID_to_name: Finished. [LOG][07-05 18:20] RNA_average_prepare: running... [LOG][07-05 18:20] RNA_average_prepare: Finished. [LOG][07-05 18:20] RNA_average: running... [LOG][07-05 18:20] RNA_average: Finished. [LOG][07-05 18:20] Output_network: running... [LOG][07-05 18:20] Output_network: Finished. [LOG][07-05 18:20] PageRank: running... Running PageRank... [LOG][07-05 18:20] PageRank: Finished.

kaizhang commented 7 years ago

@npklein Sorry I forgot that the version you used probably do not have the --select option.

You had a mis-typing in your last command -- ./taiji-Linux-x86_64-static rm "Get RNA-seq data. It should be "Get_RNA_data". If you look at the log, you will find the step "Get_RNA_data" was not executed, so the data wasn't updated. Sorry that the program currently won't warn you when there is no such entry in the database.

There is another command which let you see the cache in the database. If you type taiji cat Get_RNA_data, you can see the input it captured. In your case, it should be empty.

kaizhang commented 7 years ago

I assume the problem was solved.

npklein commented 7 years ago

@kaizhang Yes I got it to work to recognize my RNA samples, thanks. I got another problem now but will open a new issue.