lpantano / seqcluster

small RNA analysis from NGS data
http://seqcluster.readthedocs.io
MIT License
37 stars 17 forks source link

report problem #16

Closed lpantano closed 9 months ago

lpantano commented 8 years ago

Hi @naumenko-sa,

for the problem with the report we can discuss here.

That report is a template that won't work in all analysis sadly. Can you tell me what would you like with your data? Since you only have 2 samples, probably you only are interested in a couple of figures only since we cannot do a lot with that number.

Some questions:

Does root_path point to the final folder?

And what you get when you run list.files(file.path(root_path),pattern = "trimming_stats",recursive = T) inside R?

As I said, little thing you will get from this report. The most important is the size distribution that you can see it as well open the HTML from the multiqc folder. I plan to migrate almost all QC figures to there during summer, so this will be better.

If you give me more information about what you would like to have, I may be able to help.

cheers

lpantano commented 8 years ago

Hi,

as well you can try the new version of the report, you can download: https://github.com/lpantano/seqcluster/blob/master/seqcluster/templates/report.rmd

and modify root_path to try it.

naumenko-sa commented 8 years ago

Yes, I modified the root_path, so the script picks up the file. Thanks, this new report looks nice for me. Actually I have 12 samples and I need to do 7 pairwise comparisons (differential expression). I run 2 samples just to familiarize myself with the pipeline. I will run remaining 10 samples and I will use exploratory analysis you provided. After that I will return to the problem of report generation, if you don't mind. Thank you so much for your time and support of this analysis!

lpantano commented 8 years ago

nice, ok when you have more samples let me know, I am interesting in get that working in your case, because it should work nicely with these number of samples. Thanks for the help.

The new version I updated today should work, I think. At least for that point.

If you add the metadata to theYAML file, then you will have all the information in this report.

naumenko-sa commented 8 years ago

Hi again! Now I have all the samples and I'm working with the report https://github.com/lpantano/seqcluster/blob/master/seqcluster/templates/report.rmd

First, I've changed

#metadata = read.csv(metadata_fn, row.names="sample_id")
metadata = read.csv(metadata_fn)
metadata = metadata[,1] #sample_ids

to get proper sample ids.

Exploratory analysis. Size distribution. There are no trimming_stats file, trimming stat is in bcbio-nextgen-debug.log Correct files are

files = list.files(file.path(root_path),pattern = "trimming.fastq_size_stats",recursive = T)
lpantano commented 8 years ago

Hi,

I think you had an older version when you ran that. You can change that part of the code, or you can remove the final folder, and re-start to create again all the files having the names that are expected now. In the final folder the trimming stats should be like:

ls repos/bcbio-nextgen/tests/srna_test/upload/miRQCa/
miRQCa-mirbase-ready.counts miRQCa-ready.trimming_stats qc                          tdrmapper

For the first part. metadata should be a data.frame, with row.names being the sample_id column, and the columns the rest of the columns. I don't think what you have now will work for all report.

Can you paste here what you have if you do the actual code:

metadata = read.csv(metadata_fn, row.names="sample_id")
condition = names(metadata)[1]
metadata

It should be a data.frame. Thanks for the help.

naumenko-sa commented 8 years ago

Ok, I'll update and restart. About the dataframe

                           group
HI_3550_004_RPI1_R4215_R1   fake
HI_3550_004_RPI3_R4217_R1   fake
HI_3550_004_RPI5_R4219_R1   fake
HI_3550_005_RPI41_R4228_R1  fake
HI_3550_005_RPI43_R4230_R1  fake
HI_3550_005_RPI7_R4226_R1   fake
HI_3550_004_RPI2_R4216_R1   fake
HI_3550_004_RPI4_R4218_R1   fake
HI_3550_004_RPI6_R4220_R1   fake
HI_3550_005_RPI42_R4229_R1  fake
HI_3550_005_RPI44_R4231_R1  fake
HI_3550_005_RPI8_R4227_R1   fake

It is the dataframe but without "sample_id" header of the first row, it causes problems below.

lpantano commented 8 years ago

can you tell me the exact line where you find the problem with that?

because if it is in the adapter plots, it could be because you don't get any files with that pattern. I am not using sample_id in any other part of the code, so i am curios what line fails because of that.

thanks

naumenko-sa commented 8 years ago

Line 38, it is just reading of summary.csv - samples list

metadata_fn =  list.files(file.path(root_path), pattern = "summary.csv$",recursive = T, full.names = T)
metadata = read.csv(metadata_fn, row.names="sample_id")
condition = names(metadata)[1]
design = metadata
formula = ~ condition # modify this to get your own formula, it should be a column in your metadata
isde=FALSE # turn this true to make DE ananlysis
lpantano commented 8 years ago

Yes, line 38 is when you load the summary file.

I want to know when you have a problem in the code if you load that file as it is in the report, because you mentioned that produces error below if you don't change to:

#metadata = read.csv(metadata_fn, row.names="sample_id")
metadata = read.csv(metadata_fn)
metadata = metadata[,1] #sample_ids

So, I want to know where is below (when you have the issue) if you don't change that part of the code

naumenko-sa commented 8 years ago

Yes, you are right, the errors are because of missed files, these lines are ok. So I'm updating to

bcbio-nextgen: 0.9.8a0-py27_5 --> 0.9.8a0-py27_7

to generate files. Thanks!

lpantano commented 8 years ago

let me know what happens! good luck!

naumenko-sa commented 8 years ago

It seems I have some problems with metadata

[2016-05-19T20:55Z] summarize variants
[2016-05-19T20:55Z] Timing: report
Traceback (most recent call last):
  File "/home/naumenko/work/tools/bin/bcbio_nextgen.py", line 226, in <module>
    main(**kwargs)
  File "/home/naumenko/work/tools/bin/bcbio_nextgen.py", line 43, in main
    run_main(**kwargs)
  File "/home/naumenko/work/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 43, in run_main
    fc_dir, run_info_yaml)
  File "/home/naumenko/work/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 87, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/home/naumenko/work/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 330, in smallrnaseqpipeline
    srna_report(samples)
  File "/home/naumenko/work/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/srna/group.py", line 136, in report
    group = _guess_group(info)
  File "/home/naumenko/work/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/srna/group.py", line 156, in _guess_group
    return ",".join(info["metadata"].values())
TypeError: sequence item 1: expected string, NoneType found

Could you explain metadata usage in microRNA analysis? In a cancer pipeline I would use

#sample1
metadata:
    batch: batch1
    phenotype: tumor
#sample2
metadata:
    batch: batch1
    phenotype: normal

Would the same approach work for miRNA?

#sample1
metadata:
    batch: batch1
    phenotype: experiment
#sample2
metadata:
    batch: batch1
    phenotype: control
#sample3
metadata:
    batch: batch2
    phenotype: experiment
#sample2
metadata:
    batch: batch2
    phenotype: control

to compare sample1 vs sample2, sample3 vs sample2?

lpantano commented 8 years ago

Hi,

it is weird. It's supposed to work like that. I will try to solve tomorrow. It is like some of the sample in you YAML file have empty value for some of the metadata values?

I will try to reproduce.

thanks

lpantano commented 8 years ago

Hi,

I tried to reproduce but the only way it was if I had a line like this in the metadata:

metadata:
  bath:
phenotpye: something

Like an empty value. Can you check that? I will fix it anyway just in case, but it is weird is getting a None value in any of the metadata.

naumenko-sa commented 8 years ago

Hi, in the config I had:

cat srna12.yaml  | awk '{if($0~ "metadata"){print $0;getline;print $0;}}'
  metadata:
    experiment: potato
  metadata:
    experiment: potato
  metadata:
    experiment: potato
  metadata:
    experiment: potato
  metadata:
    experiment: potato
  metadata:
    experiment: potato
  metadata:
    experiment: potato
  metadata:
    experiment: potato
  metadata:
    experiment: potato
  metadata:
    experiment: bean
  metadata:
    experiment: bean
  metadata:
    experiment: bean

All metadata keys have a value. YAML is valid. I suspect that I was running the previous version of the pipeline and the report was generated by new one. I will try batch-phenotype scheme. Also for some comparisons I'm using potato reference and for others - bean reference.

naumenko-sa commented 8 years ago

Now I have a problem with config file generation for multiple samples. My potato.template.yaml is

---
details:
- algorithm:
    aligner: star
    adapters: ["TGGAATTCTCGGGTGC"]
    species: stu
    tools_off: ["seqcluster"]
  analysis: smallRNA-seq
  description: R4215
  files:
  - /home/naumenko/work/mirna/input/HI.3550.004.RPI1.R4215_R1.fastq.gz
  genome_build: soltub3
  metadata:
    batch: batch1
    phenotype: experiment
fc_date: '2016-05-10'
fc_name: srna1
upload:
  dir: ../final

my sample list srna12.csv is

samplename,description,batch,phenotype,sex,variant_regions
HI.3550.004.RPI3.R4217,R4217,batch1,experiment,,
HI.3550.004.RPI1.R4215,R4215,batch1,control,,
HI.3550.004.RPI4.R4218,R4218,batch2,experiment,,
HI.3550.004.RPI2.R4216,R4216,batch2,control,,
HI.3550.004.RPI6.R4220,R4220,batch3,experiment,,
HI.3550.004.RPI5.R4219,R4219,batch3,control,,
HI.3550.005.RPI41.R4228,R4228,batch4,experiment,,
HI.3550.005.RPI7.R4226,R4226,batch4;batch5,control,,
HI.3550.005.RPI43.R4230,R4230,batch5,experiment,,
HI.3550.005.RPI42.R4229,R4229,batch6,experiment,,
HI.3550.005.RPI8.R4227,R4227,batch6;batch7,control,,
HI.3550.005.RPI44.R4231,R4231,batch7,experiment,,

and I'm running

#!/bin/bash

KPATH=/home/naumenko/work/mirna/input

bcbio_nextgen.py -w template potato.template.yaml srna12.csv \
$KPATH/HI.3550.004.RPI3.R4217_R1.fastq.gz \
$KPATH/HI.3550.004.RPI1.R4215_R1.fastq.gz \
$KPATH/HI.3550.004.RPI4.R4218_R1.fastq.gz \
$KPATH/HI.3550.004.RPI2.R4216_R1.fastq.gz \
$KPATH/HI.3550.004.RPI6.R4220_R1.fastq.gz \
$KPATH/HI.3550.004.RPI5.R4219_R1.fastq.gz \
$KPATH/HI.3550.005.RPI41.R4228_R1.fastq.gz \
$KPATH/HI.3550.005.RPI7.R4226_R1.fastq.gz \
$KPATH/HI.3550.005.RPI43.R4230_R1.fastq.gz \
$KPATH/HI.3550.005.RPI42.R4229_R1.fastq.gz \
$KPATH/HI.3550.005.RPI8.R4227_R1.fastq.gz \
$KPATH/HI.3550.005.RPI44.R4231_R1.fastq.gz 

Warnings

WARNING: Added minimal sample information: metadata not found for HI.3550.004.RPI3.R4217_R1, HI.3550.004.RPI3.R4217_R1.fastq.gz
WARNING: Added minimal sample information: metadata not found for HI.3550.004.RPI1.R4215_R1, HI.3550.004.RPI1.R4215_R1.fastq.gz
WARNING: Added minimal sample information: metadata not found for HI.3550.004.RPI4.R4218_R1, HI.3550.004.RPI4.R4218_R1.fastq.gz
WARNING: Added minimal sample information: metadata not found for HI.3550.004.RPI2.R4216_R1, HI.3550.004.RPI2.R4216_R1.fastq.gz
WARNING: Added minimal sample information: metadata not found for HI.3550.004.RPI6.R4220_R1, HI.3550.004.RPI6.R4220_R1.fastq.gz
WARNING: Added minimal sample information: metadata not found for HI.3550.004.RPI5.R4219_R1, HI.3550.004.RPI5.R4219_R1.fastq.gz
WARNING: Added minimal sample information: metadata not found for HI.3550.005.RPI41.R4228_R1, HI.3550.005.RPI41.R4228_R1.fastq.gz
WARNING: Added minimal sample information: metadata not found for HI.3550.005.RPI7.R4226_R1, HI.3550.005.RPI7.R4226_R1.fastq.gz
WARNING: Added minimal sample information: metadata not found for HI.3550.005.RPI43.R4230_R1, HI.3550.005.RPI43.R4230_R1.fastq.gz
WARNING: Added minimal sample information: metadata not found for HI.3550.005.RPI42.R4229_R1, HI.3550.005.RPI42.R4229_R1.fastq.gz
WARNING: Added minimal sample information: metadata not found for HI.3550.005.RPI8.R4227_R1, HI.3550.005.RPI8.R4227_R1.fastq.gz
WARNING: Added minimal sample information: metadata not found for HI.3550.005.RPI44.R4231_R1, HI.3550.005.RPI44.R4231_R1.fastq.gz

It creates a config file with batch: batch1 and phenotype: experiment for all samples instead of batch1-7,experiment/control.

lpantano commented 8 years ago

probably you need to add the _R1 to the samplename column. Some time that fixes the problem. We try to detect as much as posible, some time we miss this kind of difference.

Let me know.

On May 24, 2016, at 11:31 AM, Sergey Naumenko notifications@github.com wrote:

Now I have a problem with config file generation for multiple samples. My potato.template.yaml is


details:

  • algorithm: aligner: star adapters: ["TGGAATTCTCGGGTGC"] species: stu tools_off: ["seqcluster"] analysis: smallRNA-seq description: HI.3550.004.RPI1.R4215_R1 files:
    • /home/naumenko/work/mirna/input/HI.3550.004.RPI1.R4215_R1.fastq.gz genome_build: soltub3 metadata: batch: batch1 phenotype: experiment fc_date: '2016-05-10' fc_name: srna1 upload: dir: ../final my sample list srna12.csv is

samplename,description,batch,phenotype,sex,variant_regions HI.3550.004.RPI3.R4217,R4217,batch1,experiment,, HI.3550.004.RPI1.R4215,R4215,batch1,control,, HI.3550.004.RPI4.R4218,R4218,batch2,experiment,, HI.3550.004.RPI2.R4216,R4216,batch2,control,, HI.3550.004.RPI6.R4220,R4220,batch3,experiment,, HI.3550.004.RPI5.R4219,R4219,batch3,control,, HI.3550.005.RPI41.R4228,R4228,batch4,experiment,, HI.3550.005.RPI7.R4226,R4226,batch4;batch5,control,, HI.3550.005.RPI43.R4230,R4230,batch5,experiment,, HI.3550.005.RPI42.R4229,R4229,batch6,experiment,, HI.3550.005.RPI8.R4227,R4227,batch6;batch7,control,, HI.3550.005.RPI44.R4231,R4231,batch7,experiment,, and I'm running

!/bin/bash

KPATH=/home/naumenko/work/mirna/input

bcbio_nextgen.py -w template potato.template.yaml srna12.csv \ $KPATH/HI.3550.004.RPI3.R4217_R1.fastq.gz \ $KPATH/HI.3550.004.RPI1.R4215_R1.fastq.gz \ $KPATH/HI.3550.004.RPI4.R4218_R1.fastq.gz \ $KPATH/HI.3550.004.RPI2.R4216_R1.fastq.gz \ $KPATH/HI.3550.004.RPI6.R4220_R1.fastq.gz \ $KPATH/HI.3550.004.RPI5.R4219_R1.fastq.gz \ $KPATH/HI.3550.005.RPI41.R4228_R1.fastq.gz \ $KPATH/HI.3550.005.RPI7.R4226_R1.fastq.gz \ $KPATH/HI.3550.005.RPI43.R4230_R1.fastq.gz \ $KPATH/HI.3550.005.RPI42.R4229_R1.fastq.gz \ $KPATH/HI.3550.005.RPI8.R4227_R1.fastq.gz \ $KPATH/HI.3550.005.RPI44.R4231_R1.fastq.gz Warnings

WARNING: Added minimal sample information: metadata not found for HI.3550.004.RPI3.R4217_R1, HI.3550.004.RPI3.R4217_R1.fastq.gz WARNING: Added minimal sample information: metadata not found for HI.3550.004.RPI1.R4215_R1, HI.3550.004.RPI1.R4215_R1.fastq.gz WARNING: Added minimal sample information: metadata not found for HI.3550.004.RPI4.R4218_R1, HI.3550.004.RPI4.R4218_R1.fastq.gz WARNING: Added minimal sample information: metadata not found for HI.3550.004.RPI2.R4216_R1, HI.3550.004.RPI2.R4216_R1.fastq.gz WARNING: Added minimal sample information: metadata not found for HI.3550.004.RPI6.R4220_R1, HI.3550.004.RPI6.R4220_R1.fastq.gz WARNING: Added minimal sample information: metadata not found for HI.3550.004.RPI5.R4219_R1, HI.3550.004.RPI5.R4219_R1.fastq.gz WARNING: Added minimal sample information: metadata not found for HI.3550.005.RPI41.R4228_R1, HI.3550.005.RPI41.R4228_R1.fastq.gz WARNING: Added minimal sample information: metadata not found for HI.3550.005.RPI7.R4226_R1, HI.3550.005.RPI7.R4226_R1.fastq.gz WARNING: Added minimal sample information: metadata not found for HI.3550.005.RPI43.R4230_R1, HI.3550.005.RPI43.R4230_R1.fastq.gz WARNING: Added minimal sample information: metadata not found for HI.3550.005.RPI42.R4229_R1, HI.3550.005.RPI42.R4229_R1.fastq.gz WARNING: Added minimal sample information: metadata not found for HI.3550.005.RPI8.R4227_R1, HI.3550.005.RPI8.R4227_R1.fastq.gz WARNING: Added minimal sample information: metadata not found for HI.3550.005.RPI44.R4231_R1, HI.3550.005.RPI44.R4231_R1.fastq.gz It creates a config file with batch: batch1 and phenotype: experiment for all samples.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/lpantano/seqcluster/issues/16#issuecomment-221309489

naumenko-sa commented 8 years ago

Thanks in that way it works! It is not a typical dataset: usually I have two reads in a pair _R1, and _R2. There are just _R1's. SN

naumenko-sa commented 8 years ago

Hi, finally it crashes after multiqc:

[2016-05-25T15:43Z] [INFO   ]         multiqc : MultiQC complete
[2016-05-25T15:43Z] Timing: report
Traceback (most recent call last):
  File "/home/naumenko/work/tools/bin/bcbio_nextgen.py", line 226, in <module>
    main(**kwargs)
  File "/home/naumenko/work/tools/bin/bcbio_nextgen.py", line 43, in main
    run_main(**kwargs)
  File "/home/naumenko/work/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 43, in run_main
    fc_dir, run_info_yaml)
  File "/home/naumenko/work/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 87, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/home/naumenko/work/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 330, in smallrnaseqpipeline
    srna_report(samples)
  File "/home/naumenko/work/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/srna/group.py", line 136, in report
    group = _guess_group(info)
  File "/home/naumenko/work/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/srna/group.py", line 156, in _guess_group
    return ",".join(info["metadata"].values())
TypeError: sequence item 0: expected string, list found

Some of my samples participate in two comparisons:

- algorithm:
  metadata:
    batch:
    - batch4
    - batch5
lpantano commented 8 years ago

ops, sorry. I don't support for list there. The idea is to convert to columns the values in metadata, and the idea is to put one value per key. Any reason you would need two values in batch?

naumenko-sa commented 8 years ago

I'd like to compare sample1 vs sample 2, sample 3 vs sample 2.

lpantano commented 8 years ago

well, in that case I will use something like

metadata: comparison1: group1 comparison2: group1 ... metadata: comparison1: group2 comparison2: none

and change those values to whatever you want to call the groups. But avoid for now multiple values for a variable inside metadata section.

As a head ups, I hope you have replicates when doing the differential expression analysis because DESeq2 won't work other wise. And in you case, probably you will need to modify the code, because you want to do more than one comparison.

cheers

naumenko-sa commented 8 years ago

Sorry, one more question about the report. in the counts_mirna.tsv I have 7 mln miRNAs for sample1:

cat counts_mirna.tsv | sed 1d | awk '{sum+=$2}END{print sum}'
7219907

The same quantity I see in the sample_folder/R4215-mirbase-ready.counts

cat R4215-mirbase-ready.counts | sed 1d | awk '{sum+=$3}END{print sum}'
7219907

However the report generates 639,411

obj <- IsomirDataSeqFromFiles(files, design = design, header = T, skip=0)
> sum((counts(obj))[,2])
[1] 639411

What is wrong?

Thanks, SN

lpantano commented 8 years ago

yeah, that is weird. Can you get all the sums for all the columns in counts(obj) and then load the counts_mina.tsv and get the same and compare numbers for each sample?

you can do that with colSums command.

I am trying to reproduce, but I get same numbers, so we’ll need to work further to get into this.

sorry.

On Jun 2, 2016, at 3:04 PM, Sergey Naumenko notifications@github.com wrote:

Sorry, one more question about the report. in the counts_mirna.tsv I have 7 mln miRNAs for sample1:

cat counts_mirna.tsv | sed 1d | awk '{sum+=$2}END{print sum}' 7219907 The same quantity I see in the sample_folder/R4215-mirbase-ready.counts

cat R4215-mirbase-ready.counts | sed 1d | awk '{sum+=$3}END{print sum}' 7219907 However the report generates 639,411

obj <- IsomirDataSeqFromFiles(files, design = design, header = T, skip=0)

sum((counts(obj))[,2]) [1] 639411 What is wrong?

Thanks, SN

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lpantano/seqcluster/issues/16#issuecomment-223390262, or mute the thread https://github.com/notifications/unsubscribe/ABi_HH1l8VM099bpmho6ViSgAeTzs_nzks5qHyksgaJpZM4Ihpd2.

naumenko-sa commented 8 years ago

Yes, something is wrong with the order, not with the numbers. maybe I should sort sample names in summary.csv

> colSums(counts_from_file)
   R4215    R4216    R4217    R4218    R4219    R4220    R4226    R4228    R4230 
 7219907   639411  4992095  1088285  7347927 11789081  2776391  7415751 11772770 
> colSums(counts(obj))
   R4217    R4215    R4218    R4216    R4220    R4219    R4228    R4226    R4230    R4229    R4227 
 7219907   639411  4992095  1088285  7347927 11789081  2776391 13565180  7415751  4508983 11772770 
   R4231 
10224417 
naumenko-sa commented 8 years ago

Sorry, it seems that it is my fault: I've mixed files from two pipeline runs: for 9 and 12 samples.

naumenko-sa commented 8 years ago

Same set, sorting matters:

> colSums(counts(obj))
   R4217    R4215    R4218    R4216    R4220    R4219    R4228    R4226    R4230 
 7219907   639411  4992095  1088285  7347927 11789081  2776391 13565180  7415751 
> colSums(counts_from_file)
   R4215    R4216    R4217    R4218    R4219    R4220    R4226    R4228    R4230 
 7219907   639411  4992095  1088285  7347927 11789081  2776391  7415751 11772770 
lpantano commented 8 years ago

yeah, the problem is the naming.

so in this command:

obj <- IsomirDataSeqFromFiles(files = files[rownames(design)], design = design , header = T, skip=0, quiet = FALSE)

rownames(design) is set to get the same order than design. The bottom line is that the vector files should be in the same order row.names in design matrix. Is that the problem?

On Jun 2, 2016, at 4:28 PM, Sergey Naumenko notifications@github.com wrote:

Same set, sorting matters:

colSums(counts(obj)) R4217 R4215 R4218 R4216 R4220 R4219 R4228 R4226 R4230 7219907 639411 4992095 1088285 7347927 11789081 2776391 13565180 7415751 colSums(counts_from_file) R4215 R4216 R4217 R4218 R4219 R4220 R4226 R4228 R4230 7219907 639411 4992095 1088285 7347927 11789081 2776391 7415751 11772770 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lpantano/seqcluster/issues/16#issuecomment-223412175, or mute the thread https://github.com/notifications/unsubscribe/ABi_HDTPrVjqFHmrsXvdffQqfq-aFYafks5qHzz_gaJpZM4Ihpd2.

naumenko-sa commented 8 years ago

Thanks, finally I have it right. The script is searching for all mirbase-ready files, not for listed in the summary.csv only. I had some additional files there. Thanks a lot!

> colSums(counts_from_file)
   R4215    R4216    R4217    R4218    R4219    R4220    R4226    R4228    R4230 
 7219907   639411  4992095  1088285  7347927 11789081  2776391  7415751 11772770 
> colSums(counts(obj))
   R4215    R4216    R4217    R4218    R4219    R4220    R4226    R4228    R4230 
 7219907   639411  4992095  1088285  7347927 11789081  2776391  7415751 11772770 

Could you please look at A.thaliana issue? https://github.com/chapmanb/bcbio-nextgen/issues/1416

lpantano commented 8 years ago

nice.

I will modify the genome_setup script in bcbio to be able to add that to a current genome.

On Jun 2, 2016, at 4:51 PM, Sergey Naumenko notifications@github.com wrote:

Thanks, finally I have it right. The script is searching for all mirbase-ready files, not for listed in the summary.csv only. I had some additional files there. Thanks a lot!

colSums(counts_from_file) R4215 R4216 R4217 R4218 R4219 R4220 R4226 R4228 R4230 7219907 639411 4992095 1088285 7347927 11789081 2776391 7415751 11772770 colSums(counts(obj)) R4215 R4216 R4217 R4218 R4219 R4220 R4226 R4228 R4230 7219907 639411 4992095 1088285 7347927 11789081 2776391 7415751 11772770 Could you please look at A.thaliana issue? chapmanb/bcbio-nextgen#1416 https://github.com/chapmanb/bcbio-nextgen/issues/1416 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lpantano/seqcluster/issues/16#issuecomment-223418355, or mute the thread https://github.com/notifications/unsubscribe/ABi_HCAo-Ng6VqwwetUGJmAM25HZNxFWks5qH0JOgaJpZM4Ihpd2.