lpantano / seqcluster

small RNA analysis from NGS data
http://seqcluster.readthedocs.io
MIT License
35 stars 17 forks source link

running from bcbio pipeline #29

Closed ahsen1402 closed 6 months ago

ahsen1402 commented 6 years ago

Hi, I like the report about mirQC you have shown in here: https://github.com/lpantano/mypubs/blob/master/srnaseq/mirqc/ready_report.md

Mostly, I wanted to create a barplot which has contribution of each type of RNA. As suggested I run seqcluster from bcbio to achieve this. However, in the upload folder I only see an HTML report from fastq and nothing similar shown in the above page. I modified https://github.com/lpantano/seqcluster/blob/master/data/pipeline_example/mirqc/template.yaml

the template above to do this task. Am i missing something?

Second question I have is in the log file it says:

[2018-03-14T14:28Z] multiprocessing: seqcluster_prepare [2018-03-14T14:28Z] You didn't specify any other expression caller tool.You can add to the YAML file:expression_caller:[trna, seqcluster, mirdeep2]

Is there a way to use all the capabilities without manually specifying them?

Thanks

lpantano commented 6 years ago

Hi,

So sorry about that. I have to change the config files to remove to run seqcluster always.

You can get that information adding this:

expression_caller: [seqcluster]

However, you will need to use the Rmd to get exactly that figure. Or use the bcbioSmallRNA package.

Cheers

On Mar 19, 2018, at 12:37 PM, ahsen1402 notifications@github.com wrote:

Hi, I like the report about mirQC you have shown in here: https://github.com/lpantano/mypubs/blob/master/srnaseq/mirqc/ready_report.md https://github.com/lpantano/mypubs/blob/master/srnaseq/mirqc/ready_report.md Mostly, I wanted to create a barplot which has contribution of each type of RNA. As suggested I run seqcluster from bcbio to achieve this. However, in the upload folder I only see an HTML report from fastq and nothing similar shown in the above page. I modified https://github.com/lpantano/seqcluster/blob/master/data/pipeline_example/mirqc/template.yaml https://github.com/lpantano/seqcluster/blob/master/data/pipeline_example/mirqc/template.yaml the template above to do this task. Am i missing something?

Second question I have is in the log file it says:

[2018-03-14T14:28Z] multiprocessing: seqcluster_prepare [2018-03-14T14:28Z] You didn't specify any other expression caller tool.You can add to the YAML file:expression_caller:[trna, seqcluster, mirdeep2]

Is there a way to use all the capabilities without manually specifying them?

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lpantano/seqcluster/issues/29, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi_HEk9u1j4sJqnbN9psCWravXG-audks5tf97QgaJpZM4Swfyg.

ahsen1402 commented 6 years ago

Thanks a lot. Upon your comment I checked the Rmd file ( i do not have time to run it) and in it there is the following code:

rRNA <- colSums(clus_ma[grepl("rRNA",ann) & grepl("miRNA",ann)==F,]) miRNA <- colSums(clus_ma[grepl("miRNA",ann),]) tRNA <- colSums(clus_ma[grepl("tRNA",ann) & grepl("rRNA",ann)==F & grepl("ncRNA",ann)==F & grepl("miRNA",ann)==F,]) rmsk <- colSums(clus_ma[grepl("ncRNA",ann) & grepl("rRNA",ann)==F & grepl("miRNA",ann)==F,]) total <- colSums(clus_ma)

dd <- data.frame(samples=names(rRNA), rRNA=rRNA, miRNA=miRNA, tRNA=tRNA, ncRNA=rmsk, total=total) ggplot(melt(dd)) + geom_bar(aes(x=samples,y=value,fill=variable), stat='identity', position="dodge")+ scale_fill_brewer(palette = "Set1")+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

dd_norm = dd dd_norm[,2:5] = sweep(dd[,2:5],1,dd[,6],"/") ggplot(melt(dd_norm[,1:5])) + geom_bar(aes(x=samples,y=value,fill=variable), stat='identity', position="dodge")+ scale_fill_brewer(palette = "Set1")+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+

Do you think the seqcluster has been run on the data given I have commands related to it? I am trying to avoid to rerun everything as it took me about 10 hours for the first run.

Related to that i tried to work on the RMD file but all the file definitions such

files = list.files(file.path(root_path), pattern = "trimming_stats", recursive = T)
files = list.files(file.path(root_path), pattern = "mirbase-ready", recursive = T,    full.names = T)
fn_json = list.files(file.path(root_path), pattern = "seqcluster.json", recursive = T, 
 full.names = T)

all these commands return empty file. When I run markdown no images returned in the html. So does this mean that I have no results? I get a little confused.

Thanks

lpantano commented 6 years ago

Hi,

Yes, that is the part of the code to use.

And, I don’t think so. But you can re-start from there and it will do only that part (although you’ll need to empty the checkpoint_parallel folder). If it is done, it should be inside seqcsluter/cluster folder. Do you have something there?

Cheers

On Mar 20, 2018, at 11:52 AM, ahsen1402 notifications@github.com wrote:

Thanks a lot. Upon your comment I checked the Rmd file ( i do not have time to run it) and in it there is the following code:

rRNA <- colSums(clus_ma[grepl("rRNA",ann) & grepl("miRNA",ann)==F,]) miRNA <- colSums(clus_ma[grepl("miRNA",ann),]) tRNA <- colSums(clus_ma[grepl("tRNA",ann) & grepl("rRNA",ann)==F & grepl("ncRNA",ann)==F & grepl("miRNA",ann)==F,]) rmsk <- colSums(clus_ma[grepl("ncRNA",ann) & grepl("rRNA",ann)==F & grepl("miRNA",ann)==F,]) total <- colSums(clus_ma)

dd <- data.frame(samples=names(rRNA), rRNA=rRNA, miRNA=miRNA, tRNA=tRNA, ncRNA=rmsk, total=total) ggplot(melt(dd)) + geom_bar(aes(x=samples,y=value,fill=variable), stat='identity', position="dodge")+ scale_fill_brewer(palette = "Set1")+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

dd_norm = dd dd_norm[,2:5] = sweep(dd[,2:5],1,dd[,6],"/") ggplot(melt(dd_norm[,1:5])) + geom_bar(aes(x=samples,y=value,fill=variable), stat='identity', position="dodge")+ scale_fill_brewer(palette = "Set1")+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+

Do you think the seqcluster has been run on the data given I have commands related to it? I am trying to avoid to rerun everything as it took me about 10 hours for the first run. Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lpantano/seqcluster/issues/29#issuecomment-374650043, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi_HNIRbATrRnWOvTIODjz6DD5er-irks5tgSXIgaJpZM4Swfyg.

ahsen1402 commented 6 years ago

Thanks. I have 4 count matrices in the main analysis folder. Mainly, counts_mirna.csv, counts_novel.csv,counts.csv,counts_mirna_novel.csv , is there a description how these are generated. Also what are the symbols after : means in hsa-let-7a-3p:0:0:AA:c .

lpantano commented 6 years ago

Hi,

The description of those files are here: http://bcbio-nextgen.readthedocs.io/en/latest/contents/outputs.html#small-rna-seq https://bcbio-nextgen.readthedocs.io/en/latest/contents/outputs.html#small-rna-seq

Hope this help, let me know if there are more questions.

Cheers

On May 14, 2018, at 5:20 PM, ahsen1402 notifications@github.com wrote:

Thanks. I have 4 count matrices in the main analysis folder. Mainly, counts_mirna.csv, counts_novel.csv,counts.csv,counts_mirna_novel.csv , is there a description how these are generated.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lpantano/seqcluster/issues/29#issuecomment-388966592, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi_HKmmc560fLOjoICE422_Ucwn0xGuks5tyfUIgaJpZM4Swfyg.

ahsen1402 commented 6 years ago

Great thanks. I have some new data that has paired end reads how should I input them to the algorithm?

lpantano commented 6 years ago

Hi,

Currently the pipeline doesn’t support paired data, because the majority of the tools for small RNA work with single end. In principle you can use the first read file only and that would work.

Cheers

On May 24, 2018, at 5:29 PM, ahsen1402 notifications@github.com wrote:

Great thanks. I have some new data that has paired end reads how should I input them to the algorithm?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lpantano/seqcluster/issues/29#issuecomment-391869184, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi_HCW0Ss1dk4SylcwANnaowoPz8o-Nks5t1yY7gaJpZM4Swfyg.