sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
275 stars 67 forks source link

No "featureCounts" result files from new version 0.0.4 #24

Closed ChongLu121 closed 6 years ago

ChongLu121 commented 6 years ago

Dear Christoph and Swati,

I used the new version zUMIs today with the example data but got a strange error which didn't occur in the last zUMIs version. On our server, the previous version of zUMIs ran without any errors.

Here is the standard out:

You provided these parameters: SLURM workload manager: no Summary Stats to produce: yes Start the pipeline from: filtering A custom mapped BAM: NA Custom filtered FASTQ: no Barcode read: /scratch/lu/zUMIs/ExampleData/barcoderead_HEK.1mio.fq.gz cDNA read: /scratch/lu/zUMIs/ExampleData/cDNAread_HEK.1mio.fq.gz Study/sample name: chr22test Output directory: /scratch/lu/genome-index/out Cell/sample barcode range: 1-6 UMI barcode range: 7-16 Retain cell with >=N reads: 100 Genome directory: /scratch/lu/genome-index/genome/reference/hg38_chr22 GTF annotation file: /scratch/lu/genome-index/genome/reference/GRCh38.84.chr22.gtf Number of processors: 8 Read length: 50 Strandedness: 0 Cell barcode Phred: 20 UMI barcode Phred: 20

bases below phred in CellBC: 1

bases below phred in UMI: 1

Hamming Distance (UMI): 0 Hamming Distance (CellBC): 0 Plate Barcode Read: NA Plate Barcode range: NA Barcodes: NA zUMIs directory: /scratch/lu/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript Additional STAR parameters: STRT-seq data: no InDrops data: no Library read for InDrops: NA Barcode read2(STRT-seq): NA Barcode read2 range(STRT-seq): 0-0 Bases(G) to trim(STRT-seq): 3 Subsampling reads: 0

zUMIs version 0.0.4

Raw reads: 1000000 Filtered reads: 853296

Feb 26 18:31:21 ..... started STAR run Feb 26 18:31:21 ..... loading genome Feb 26 18:31:24 ..... processing annotations GTF Feb 26 18:31:24 ..... inserting junctions into the genome indices Feb 26 18:31:48 ..... started 1st pass mapping Feb 26 18:34:41 ..... finished 1st pass mapping Feb 26 18:34:41 ..... inserting junctions into the genome indices Feb 26 18:35:05 ..... started mapping Feb 26 18:38:01 ..... finished successfully [bam_sort_core] merging from 0 files and 8 in-memory blocks... [bam_sort_core] merging from 0 files and 8 in-memory blocks... Loading required package: optparse [1] "I am loading useful packages..." [1] "2018-02-26 18:38:10 CET" [1] "I am making annotations in SAF... This will take less than 3 minutes..." [1] "2018-02-26 18:38:27 CET" Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning message: In .get_cds_IDX(type, phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored. 'select()' returned 1:many mapping between keys and columns [1] "I am making count tables...This will take a while!!" [1] "2018-02-26 18:38:39 CET"

    ==========     _____ _    _ ____  _____  ______          _____
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.26.0
//========================== featureCounts setting ===========================\ Input files : 1 BAM file S /scratch/lu/genome-index/out/chr22test.ali ...
Dir for temp files : .
Threads : 1
Level : meta-feature level
Paired-end : no
Strand specific : no
Multimapping reads : primary only
Multi-overlapping reads : not counted
Min overlapping bases : 1

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file ./.Rsubread_UserProvidedAnnotation_pid27123 ... Features : 5737 Meta-features : 698 Chromosomes/contigs : 3
Process BAM file /scratch/lu/genome-index/out/chr22test.aligned.sorted ...
Single-end reads are included.
Assign reads to features...
Total reads : 853296
Successfully assigned reads : 46553 (5.5%)
Running time : 0.03 minutes
Read assignment finished.

\===================== http://subread.sourceforge.net/ ======================//

    ==========     _____ _    _ ____  _____  ______          _____
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.26.0
//========================== featureCounts setting ===========================\ Input files : 1 BAM file S /scratch/lu/genome-index/out/chr22test.ali ...
Dir for temp files : .
Threads : 1
Level : meta-feature level
Paired-end : no
Strand specific : no
Multimapping reads : primary only
Multi-overlapping reads : not counted
Min overlapping bases : 1

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file ./.Rsubread_UserProvidedAnnotation_pid27123 ... Features : 26131 Meta-features : 1190 Chromosomes/contigs : 4
Process BAM file /scratch/lu/genome-index/out/chr22test.aligned.sorted ...
Single-end reads are included.
Assign reads to features...
Total reads : 853296
Successfully assigned reads : 21890 (2.6%)
Running time : 0.03 minutes
Read assignment finished.

\===================== http://subread.sourceforge.net/ ======================//

cut: /scratch/lu/genome-index/out/chr22test.aligned.sorted.bam.ex.featureCounts: No such file or directory Error in data.table::fread(paste("cut -f4 ", abamfile[2], ".featureCounts", : File is empty: /dev/shm/file69f3590aefa4 Calls: makeGEprofile -> In addition: Warning messages: 1: In paste("readSummary", ann, files_C, fout, as.numeric(isPairedEnd), : NAs introduced by coercion 2: In paste("readSummary", ann, files_C, fout, as.numeric(isPairedEnd), : NAs introduced by coercion Execution halted [1] "I am loading useful packages for plotting..." [1] "2018-02-26 18:38:46 CET" Error in gzfile(file, "rb") : cannot open the connection Calls: readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/scratch/lu/genome-index/out/zUMIs_output/expression/chr22test.dgecounts.rds', probable reason 'No such file or directory' Execution halted

And I still have another question, if the fastq data is very large (about 26 Gb), would you suggest that I split those input files into small files?

All the best, Chong

cziegenhain commented 6 years ago

Hey Chong,

Please update Rsubread and everything should be back working fine. I have successfully run zUMIs with more than a billion read pairs, so just go ahead with your large fastq.

Feel free to reopen the issue if you have any other questions.

Best, Christoph

MingBit commented 6 years ago

Hi Christoph,

Same error shows up even though I have updated the Rsubread package. Anything else I could try? Thanks a lot.

cziegenhain commented 6 years ago

Check the verbose of featureCounts again carefully within your zUMIs call. Which version number is reported? Best, Christoph

MingBit commented 6 years ago

Your jobs will run on this machine.

Make sure you have more than 1,9G RAM and 4 processors available.

Your jobs will be started from filtering.

You provided these parameters: SLURM workload manager: no Summary Stats to produce: yes Start the pipeline from: filtering A custom mapped BAM: NA Custom filtered FASTQ: no Barcode read: /home/angela/zUMIs/ExampleData/barcoderead_HEK.1mio.fq.gz cDNA read: /home/angela/zUMIs/ExampleData/cDNAread_HEK.1mio.fq.gz Study/sample name: test Output directory: /home/angela/zUMIs/ExampleData Cell/sample barcode range: 1-6 UMI barcode range: 7-16 Retain cell with >=N reads: 100 Genome directory: /home/angela/SC_PRE/reference/hg38_chr22 GTF annotation file: /home/angela/SC_PRE/reference/GRCh38.84.chr22.gtf Number of processors: 4 Read length: 50 Strandedness: 0 Cell barcode Phred: 20 UMI barcode Phred: 20

bases below phred in CellBC: 1

bases below phred in UMI: 1

Hamming Distance (UMI): 0 Hamming Distance (CellBC): 0 Plate Barcode Read: NA Plate Barcode range: NA Barcodes: NA zUMIs directory: /home/angela/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript Additional STAR parameters:
STRT-seq data: no InDrops data: no Library read for InDrops: NA Barcode read2(STRT-seq): NA Barcode read2 range(STRT-seq): 0-0 Bases(G) to trim(STRT-seq): 3 Subsampling reads: 0

zUMIs version 0.0.4

Raw reads: 0 Filtered reads: 0

Mar 13 14:16:41 ..... Started STAR run Mar 13 14:16:41 ..... Loading genome gzip: /home/angela/zUMIs/ExampleData/test.cdnaread.filtered.fastq.gz: No such file or directory Mar 13 14:16:41 ..... Processing annotations GTF Mar 13 14:16:41 ..... Inserting junctions into the genome indices Mar 13 14:16:50 ..... Started 1st pass mapping Mar 13 14:16:50 ..... Finished 1st pass mapping Mar 13 14:16:50 ..... Inserting junctions into the genome indices Mar 13 14:16:54 ..... Started mapping gzip: /home/angela/zUMIs/ExampleData/test.cdnaread.filtered.fastq.gz: No such file or directory Mar 13 14:16:54 ..... Finished successfully Loading required package: optparse [1] "I am loading useful packages..." [1] "2018-03-13 14:16:54 CET" [1] "I am making annotations in SAF... This will take less than 3 minutes..." [1] "2018-03-13 14:17:00 CET" Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK 'select()' returned 1:many mapping between keys and columns [1] "I am making count tables...This will take a while!!" [1] "2018-03-13 14:17:04 CET"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.28.1
//========================== featureCounts setting ===========================\ Input files : 1 BAM file S /home/angela/zUMIs/ExampleData/test.aligne ...
Dir for temp files : .
Threads : 1
Level : meta-feature level
Paired-end : no
Strand specific : no
Multimapping reads : primary only
Multi-overlapping reads : not counted
Min overlapping bases : 1

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file ./.Rsubread_UserProvidedAnnotation_pid32077 ... Features : 5737 Meta-features : 698 Chromosomes/contigs : 3
Process BAM file /home/angela/zUMIs/ExampleData/test.aligned.sorted.ba ...
Single-end reads are included.
Assign reads to features...
Total reads : 0
Successfully assigned reads : 0
Running time : 0.00 minutes
Read assignment finished.

\===================== http://subread.sourceforge.net/ ======================//

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.28.1
//========================== featureCounts setting ===========================\ Input files : 1 BAM file S /home/angela/zUMIs/ExampleData/test.aligne ...
Dir for temp files : .
Threads : 1
Level : meta-feature level
Paired-end : no
Strand specific : no
Multimapping reads : primary only
Multi-overlapping reads : not counted
Min overlapping bases : 1

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file ./.Rsubread_UserProvidedAnnotation_pid32077 ... Features : 26131 Meta-features : 1190 Chromosomes/contigs : 4
Process BAM file /home/angela/zUMIs/ExampleData/test.aligned.sorted.ba ...
Single-end reads are included.
Assign reads to features...
Total reads : 0
Successfully assigned reads : 0
Running time : 0.00 minutes
Read assignment finished.

\===================== http://subread.sourceforge.net/ ======================//

Error in data.table::fread(paste("cut -f4 ", abamfile[2], ".featureCounts", : File is empty: /dev/shm/file7d4d29bc49fb Calls: makeGEprofile -> Execution halted [1] "I am loading useful packages for plotting..." [1] "2018-03-13 14:17:05 CET" mv: cannot stat '/home/angela/zUMIs/ExampleData/test.barcoderead.filtered.fastq.gz': No such file or directory mv: cannot stat '/home/angela/zUMIs/ExampleData/test.cdnaread.filtered.fastq.gz': No such file or directory

zUMI version: 0.0.4 Rsubread version: 1.28.1

I was testing with example data. Is it possible that it might be caused by "$/zUMIs/ExampleData/test.cdnaread.filtered.fastq.gz not found". I was wondering is this a warning or an error??

Thanks. :)

cziegenhain commented 6 years ago

Hey,

In this log, the Rsubread version does not seem to be the issue. To me it looks like something is wrong with the input files, as already the filtering step says Raw reads: 0 Filtered reads: 0

Please redownload the fastq files from github and double check that the path is correct.

Best, Christoph

MingBit commented 6 years ago

Thanks for quick response. :+1: Just re-downloaded fastq files and double checked the input path. But same issue: Raw reads: 0 Filter reads: 0

bash ~/zUMIs/zUMIs-master.sh -f ~/zUMIs/ExampleData/barcoderead_HEK.1mio.fq.gz -r ~/zUMIs/ExampleData/cDNAread_HEK.1mio.fq.gz -c 1-6 -m 7-16 -l 50 -g ~/SC_PRE/reference/hg38_chr22/ -a ~/SC_PRE/reference/GRCh38.84.chr22.gtf -n chr22test -p 8 -o ~/zUMIs/ExampleData

sdparekh commented 6 years ago

Hi, Can you send md5sum and ls -lh of the downloaded fastq files?

md5sum ~/zUMIs/ExampleData/barcoderead_HEK.1mio.fq.gz md5sum ~/zUMIs/ExampleData/cDNAread_HEK.1mio.fq.gz

ls -lh ~/zUMIs/ExampleData/

Thank you Swati

MingBit commented 6 years ago

total 295228 drwxrwxr-x 5 angela angela 4096 Mär 14 09:16 ./ drwxrwxr-x 4 angela angela 4096 Mär 14 09:06 ../ -rwxrwxr-x 1 angela angela 20900758 Mär 14 09:02 barcoderead_HEK.1mio.fq.gz -rwxrwxr-x 1 angela angela 48684242 Mär 14 09:02 cDNAread_HEK.1mio.fq.gz -rw-rw-r-- 1 angela angela 463 Mär 14 09:15 chr22test.aligned.sorted.bam -rw-rw-r-- 1 angela angela 0 Mär 14 09:15 chr22test.aligned.sorted.bam.ex.featureCounts -rw-rw-r-- 1 angela angela 0 Mär 14 09:15 chr22test.aligned.sorted.bam.in.featureCounts -rw-rw-r-- 1 angela angela 24 Mär 14 09:15 chr22test.barcodelist.filtered.sort.sam -rw-rw-r-- 1 angela angela 0 Mär 14 09:15 chr22test.barcoderead.filtered.fastq -rw-rw-r-- 1 angela angela 0 Mär 14 09:15 chr22test.cdnaread.filtered.fastq -rw-rw-r-- 1 angela angela 1645 Mär 14 09:15 chr22test.Log.final.out -rw-rw-r-- 1 angela angela 52903 Mär 14 09:15 chr22test.Log.out -rw-rw-r-- 1 angela angela 329 Mär 14 09:15 chr22test.Log.progress.out -rw-rw-r-- 1 angela angela 0 Mär 14 09:15 chr22test.SJ.out.tab drwx------ 2 angela angela 4096 Mär 14 09:15 chr22test._STARgenome/ drwx------ 2 angela angela 4096 Mär 14 09:15 chr22test._STARpass1/ -rwxrwxr-x 1 angela angela 44426144 Mär 14 09:02 test.aligned.sorted.bam -rwxrwxr-x 1 angela angela 53060473 Mär 14 09:02 test.aligned.sorted.bam.ex.featureCounts -rwxrwxr-x 1 angela angela 53306382 Mär 14 09:02 test.aligned.sorted.bam.in.featureCounts -rwxrwxr-x 1 angela angela 81483817 Mär 14 09:02 test.barcodelist.filtered.sort.sam -rwxrwxr-x 1 angela angela 1836 Mär 14 09:02 test.Log.final.out -rwxrwxr-x 1 angela angela 55510 Mär 14 09:02 test.Log.out -rwxrwxr-x 1 angela angela 683 Mär 14 09:02 test.Log.progress.out -rwxrwxr-x 1 angela angela 279621 Mär 14 09:02 test.SJ.out.tab drwxrwxr-x 5 angela angela 4096 Mär 14 09:02 zUMIs_output/

MingBit commented 6 years ago

angela@angela-MS-7817:~/zUMIs/ExampleData$ md5sum barcoderead_HEK.1mio.fq.gz 03d350700dbe1f516200687dcf8125da barcoderead_HEK.1mio.fq.gz angela@angela-MS-7817:~/zUMIs/ExampleData$ md5sum cDNAread_HEK.1mio.fq.gz c0d5f3eba4e3cf0c14aed506d0a49f30 cDNAread_HEK.1mio.fq.gz

Thank you very much Swati! :)

sdparekh commented 6 years ago

Okay, that seems fine to me. One more thing, did you install pigz?

If so, can you tell me the output of pigz -dc barcoderead_HEK.1mio.fq.gz | wc -l

MingBit commented 6 years ago

I have installed the pigz but didn't change its path. NICE! :+1: it's working now! Thanks a lot!

sdparekh commented 6 years ago

I am glad that it works!! :) Thank you for using zUMIs and let us know if you need any further help.