lpantano / seqcluster

small RNA analysis from NGS data
http://seqcluster.readthedocs.io
MIT License
37 stars 17 forks source link

Contribution by class #21

Closed ramaniak closed 7 years ago

ramaniak commented 7 years ago

Hello, I "successfully" ran the smallRNA-seq pipeline on my microRNA data and am now going through the report generation R script to check for quality and such.

I am seeing a few issues and was wondering if I could bug you about it:

1. Contribution by class I read in counts.tsv as in the module below

clus <- read.table(file.path(seq_dir, "counts.tsv"),header=T,sep="\t",row.names=1, check.names = FALSE)
ann <- clus[,2]
toomany <- clus[,1]
clus_ma <- clus[,3:ncol(clus)]
clus_ma = clus_ma[,row.names(design)]

The annotation (column 3) appears to be all "|" and therefore when I ask it to count the number of rRNA, tRNA and miRNA seen in the samples, I only get zero's returned. I checked the counts.tsv file (in final/project*/seqcluster/counts.tsv) and the "ann" column has no annotations. Is there a flag or step I need to incorporate in my pipeline for this to work?

2. pheatmap

### Clustering
{r mirna-clustering, eval=ismirbase}
counts = counts(obj)
dds = DESeqDataSetFromMatrix(counts[rowSums(counts>0)>3,], colData=design, design=~1)
vst = rlog(dds)

pheatmap(assay(vst), annotation_col = design, show_rownames = F, 
         clustering_distance_cols = "correlation",
         clustering_method = "ward.D")

when I try to run the pheatmap I get the following error : Error in cut.default(a, breaks = 100) : 'x' must be numeric

the vst =rlog(dds) executes fine

summary(vst)
        Length          Class           Mode 
          1198 DESeqTransform             S4 
  1. MDS plot
    {r pca, eval=ncol(counts) > 1, eval=ismirbase}
    mds(assay(vst), condition = design[,condition])

    this gives me Error: could not find function "mds" should I be loading any specific library or can I run the mds plot through ggplot?

thanks Arun

lpantano commented 7 years ago

Hi,

thank you for the feedback and sorry for the issues.

For point 1: did you run this with bcbio-nextgen or our your own? Normally you need a GTF file to annotate the cluster. What is the command line you used in this case?

For point 2: Can you try this command and see if it works?

pheatmap(assay(vst), show_rownames = F, clustering_distance_cols = "correlation", clustering_method = "ward.D")

If yes, can you paste here the output of: design?

For point 3: you need to do this to get that function: devtools::install_github("hbc/CHBUtils") and then library(CHBUtils)

Let me know if you get something solved. Thanks!

ramaniak commented 7 years ago

hello Lorena,

I just got point2 working. Also, I figured out that I did not have CHBUtils and once I got that loaded point3 was also fine.

As for point1: I ran the bcbio-nextgen pipeline. Does it have a built in GTF file that it would use or should I provide one? Do I have to provide the path for the GTF file in the sample yaml file?

thanks Arun

lpantano commented 7 years ago

What genome did you used?

ramaniak commented 7 years ago

GRCh37

lpantano commented 7 years ago

Can you send me the bcbio-nextgen-command.log inside logs folder? Then I will check the GTF file installed by default, probably there is an issue there. Thanks

ramaniak commented 7 years ago

Here is the log. Thanks for looking into this.

Arun bcbio-nextgen-commands.log.gz

lpantano commented 7 years ago

Thanks, I will try to fix this as soon as possible. The GTF files has the wrong chromosome names. You can modify this one:

hpf/largeprojects/ccmbio/naumenko/tools/bcbio/genomes/Hsapiens/GRCh37/srnaseq/srna-transcripts.gtf

and remove the chr word from there if you want a quick solution. Then you will need to remove the folder seqcluster/cluster and restart to get it done again.

thanks!

ramaniak commented 7 years ago

Great, will replace the "chr" and give that a try.

thanks Arun

ramaniak commented 7 years ago

Hello Lorena, That seems to have done the trick.

One other question:

mirdeep2_files = list.files(file.path(root_path),pattern = "novel-ready",recursive = T,full.names = T)
ismirdeep2 = length(mirdeep2_files) > 0

I don't seem to have any novel-ready files. Is this normal?

Thanks

ramaniak commented 7 years ago

I see this in my log

Cannot detect miraligner version, asumming latest.
[2017-01-18T17:20Z] Looking for mirdeep2 database for DIPG100T
[2017-01-18T17:20Z] miraligner version null,
lpantano commented 7 years ago

Hi,

The warning in miraligner is ok, just ignoring it.

Do you have a folder called mirdeep2? if yes, do you have something inside? can you send me the output of ls for that folder? and the same for one folder inside mirbase?

if no, you need to add mirdeep2 to the tools to be ran together with seqcluster...

ramaniak commented 7 years ago

I do have a folder called mirdeep2 and here are the contents

cd mirdeep2/
[arun@qlogin3 mirdeep2]$ ls
align.bam  bcbiotx  file_reads.fa

and for mirbase

cd ../mirbase/
[arun@qlogin3 mirbase]$ ls
counts_mirna.tsv  DIPG100T  DIPG14N  DIPG18N  DIPG25T  DIPG26T  DIPG27T  DIPG29T  DIPG57T  DIPG60T  DIPG61T  DIPG67T  DIPG89T  DIPG92T  DIPG96T
counts.tsv    DIPG105T  DIPG14T  DIPG18T  DIPG26N  DIPG27N  DIPG28T  DIPG52T  DIPG60N  DIPG61N  DIPG62T  DIPG71N  DIPG8T   DIPG93T

In mirbase, I have a folder for each of my samples while I do not have this for mirdeep2

lpantano commented 7 years ago

thanks, I have an idea what is going on. Can you tell send the ls output of one of those sample inside mirbase? any of them will do.

ramaniak commented 7 years ago

Here it is:

ls
DIPG100T.mirna  DIPG100T.mirna.back  DIPG100T.mirna.back_summary
lpantano commented 7 years ago

Last question, can you let me know what you have in hpf/largeprojects/ccmbio/naumenko/tools/bcbio/genomes/Hsapiens/GRCh37/srnaseq/

lpantano commented 7 years ago

As well, I don't see mirdeep2 run in you command log files.Can you send me the debug.com file and your config file for this run?

Thanks!

ramaniak commented 7 years ago
ls /hpf/largeprojects/ccmbio/naumenko/tools/bcbio/genomes/Hsapiens/GRCh37/srnaseq/

hairpin.fa  miR_Family_Info.txt  miRNA.str        srna-transcripts.gtf        trna_mature_pre.fa
mature.fa   mirna_mature.txt.gz  Rfam_for_miRDeep.fa  Summary_Counts.all_predictions.txt

I don't see a debug.com, did you mean debub.log, if so that's attached along with the config

bcbio-nextgen-debug.log.gz allsamples.yaml.gz

ramaniak commented 7 years ago

Hi Lorena, Sorry to but: did you have any luck with the files?

thanks Arun

lpantano commented 7 years ago

Hi,

I am trying to reproduce the error, as soon as I get some idea what is going on I will ping here. Everything should be there to run mirdeep2, so not sure why it's doing it. I am trying to add some info message that can help us debugging.

thanks!

lpantano commented 7 years ago

Can you check you have miRDeep2.pl executable file in the same path than bcbio is?

ramaniak commented 7 years ago

Thanks. Here is the path to miRDeep2.pl

/home/naumenko/work/tools/bcbio/anaconda/bin/miRDeep2.pl

Is that the right path?

Thanks

lpantano commented 7 years ago

Hi,

I am still finding problems to reproduce this. Can you tell me the bcbio version you have?

I add some more messages to the last development, so you can try to update to the development version and send me again the *debug.log file again.

Thanks!

ramaniak commented 7 years ago

I will try the update. Quick question: does this step write tmp files? If so, what's the location of the tmp folder? Maybe it's a permission issue?

On Sunday, January 22, 2017, Lorena Pantano notifications@github.com wrote:

Hi,

I am still finding problems to reproduce this. Can you tell me the bcbio version you have?

I add some more messages to the last development, so you can try to update to the development version and send me again the *debug.log file again.

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lpantano/seqcluster/issues/21#issuecomment-274336542, or mute the thread https://github.com/notifications/unsubscribe-auth/AFzoBqgGtfeMPaiJCKwaIIoUbRjVPp4uks5rU3G6gaJpZM4Lkslj .

lpantano commented 7 years ago

it should be inside mirdeep2. if that's a problem i can try to modify that internally to be modified to other location.

sent not from my computer

On Jan 21, 2017, at 13:46, Arun Ramani notifications@github.com wrote:

Thanks. Here is the path to miRDeep2.pl

/home/naumenko/work/tools/bcbio/anaconda/bin/miRDeep2.pl

Is that the right path?

Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

ramaniak commented 7 years ago

you mean the mirdeep2 folder in the output directory? that won't have any permission issues. Trying the update, will keep you posted.

Arun

On Sun, Jan 22, 2017 at 12:26 PM, Lorena Pantano notifications@github.com wrote:

it should be inside mirdeep2. if that's a problem i can try to modify that internally to be modified to other location.

sent not from my computer

On Jan 21, 2017, at 13:46, Arun Ramani notifications@github.com wrote:

Thanks. Here is the path to miRDeep2.pl

/home/naumenko/work/tools/bcbio/anaconda/bin/miRDeep2.pl

Is that the right path?

Thanks

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lpantano/seqcluster/issues/21#issuecomment-274344809, or mute the thread https://github.com/notifications/unsubscribe-auth/AFzoBgRNXFBBa_EMUMbqKVqRg41GRxZkks5rU5FRgaJpZM4Lkslj .

ramaniak commented 7 years ago

I am seeing the following in the log (the run is still going on) and not sure if that's the problem?

Preparing for mirdeep2 analysis. [2017-01-23T02:03Z] mirdeep2 Rfam file not instaled. Skipping...

On Sun, Jan 22, 2017 at 12:45 PM, Arun Ramani ramaniak@gmail.com wrote:

you mean the mirdeep2 folder in the output directory? that won't have any permission issues. Trying the update, will keep you posted.

Arun

On Sun, Jan 22, 2017 at 12:26 PM, Lorena Pantano <notifications@github.com

wrote:

it should be inside mirdeep2. if that's a problem i can try to modify that internally to be modified to other location.

sent not from my computer

On Jan 21, 2017, at 13:46, Arun Ramani notifications@github.com wrote:

Thanks. Here is the path to miRDeep2.pl

/home/naumenko/work/tools/bcbio/anaconda/bin/miRDeep2.pl

Is that the right path?

Thanks

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lpantano/seqcluster/issues/21#issuecomment-274344809, or mute the thread https://github.com/notifications/unsubscribe-auth/AFzoBgRNXFBBa_EMUMbqKVqRg41GRxZkks5rU5FRgaJpZM4Lkslj .

lpantano commented 7 years ago

Hi,

I see the problem now.

This genome version wasn't updated for the srna resources file. Can you update data and make sure you have this version inside your genomes folder? https://github.com/chapmanb/bcbio-nextgen/blob/master/config/genomes/GRCh37-resources.yaml

It should be inside GRCh37/seq. Let me know if that solves the problem. Normally I used hg19 for srna-seq, so I didn't about this as a problem.

Thanks

naumenko-sa commented 7 years ago

Hi Lorena! May I ask you why did you use hg19? I did many exomes and rna-seq with bcbio, and I started with hg19, but then switched to GRCh37. I don't remember why, but there were some problems with hg19 in these pipelines. So for users it is natural to run miRNA pipeline on GRCh37 as well. Sergey

ramaniak commented 7 years ago

Just re-started the run and looks like mirdeep2 analysis is being carried out. Thanks for the update @naumenko-sa!

lpantano commented 7 years ago

The main reason to use hg19 is because I use a lot the annotation form ucsc and just to avoid the problem about chr naming annotation, directly I use that. So no further reason just than avoiding chromosome conversion to ensembl genome.