Closed gk-bioin4m8x closed 6 years ago
Hi,
You can merge multiple samples and give the sample barcodes in a text file with -b option. If you don't want to merge, you can run zUMIs in a for loop with multiple samples. It is more efficient to merge and run the samples. You can separate them later from the count tables.
Best, Swati
Thanks @sdparekh ! I will try that.
Hi @sdparekh ,
I ran zUMIs bash script on merged files. It ran without any errors, but I neither got .featureCounts files under "out" dir nor anything under "out/zUMIs/output/stats/" and "out/zUMIs/output/expression/".
I re-run the script by adding -w Counting, but it has replaced the filtered files with their shortcuts under "out/zUMIs/output/filtered_fastq/".
Please guide.
Thanks.
Hey,
thats odd. The moving of the fastq files happens when zUMIs thinks it has finished and should clean the folder up.
I think to troubleshoot this it would be great if you could attach
sessionInfo()
)@cziegenhain Thanks. Sorry that was after -w Summarizing option.
I got the error related to usage of perl fqcheck.pl after -w Counting option on standard output.
I already lost my filtered files, so need to run zUMIs again.
Do you have any recommendation why I am not getting the .featureCounts files and the files under stats folder so that I can edit my script?
Here are my commands inside bash script:
p=/path/to/zUMIs
e=/path/to/project
bash $p/zUMIs-master.sh -f $e/merged_barcode.fastq.gz -r $e/merged_cDNA.fastq.gz -n project -g /path/to/star4zUMIs -a /path/to/.gtf -c 1-6 -m 1-8 -l 75 -p 8 -b $e/my_zUMIs/barcode_samples.txt -o $e/my_zUMIs/out
OS: Linux 2014 x86_64 R version: 3.4.3
Thanks for the feedback, I am a bit confused to be honest.
Whats the version of Rsubread you are using?
If you could just rerun all the way from the beginning and send the log files with standard out & errors, that would be very helpful. Also would be good so see the folder contents after the run with file sizes. (ls -sh
)
The issue is your path variable. You don't give "/path/to", you should give a real path to that folder. For instance if I stored zUMIs in a folder named projects which is under /data directory then I would give absolute path to zUMIs like this. /data/projects/zUMIs
. The program is not able to identify the original path to inputs, output directory and to zUMIs.
@sdparekh I gave real path only, that was for demo purpose. :-)
Haha!! Okay that is still in the confused state. I would not know without knowing the error causiing your issue. Can you please do as Christoph said :)
Yes, I proceeded in that way. zUMI is running from beginning :-) Will let you know as soon as I am finished.
@cziegenhain @sdparekh Still same issues: 1) No ex.featureCounts, in.featureCounts and Rplots.pdf files under "out" dir 2) No files under "out/zUMIs/output/stats/" and "out/zUMIs/output/expression/"
Rsubread version 1.28.1
Script:
p=/path/to/zUMIs
e=/path/to/project
bash $p/zUMIs-master.sh -i $p -V /home/me/R-3.4.3/bin -f $e/merged_barcode.fastq.gz -r $e/merged_cDNA.fastq.gz -n project -g /path/to/star4zUMIs -a /path/to/.gtf -c 1-6 -m 1-8 -l 75 -p 8 -b $e/my_zUMIs/barcode_samples.txt -o $e/my_zUMIs/out
Folder contents:
$ cd my_zUMIs/out
$ ls -sh
total 59G
19G project.aligned.sorted.bam 28K project.Log.out 512 project._STARgenome
41G project.barcodelist.filtered.sort.sam 17K project.Log.progress.out 512 project._STARpass1
2.0K project.Log.final.out 7.9M project.SJ.out.tab 512 zUMIs_output
$ cd zUMIs_output
$ ls -sh
total 512
0 expression 512 filtered_fastq 0 stats
$ cd filtered_fastq
$ ls -sh
total 17G
3.7G project.barcoderead.filtered.fastq.gz 13G project.cdnaread.filtered.fastq.gz
Please let me know further.
Thanks.
@cziegenhain @sdparekh Any updates please?
Hey, I dont see any obvious mistake so far but you again did not post the verbose of zUMIs so we cant know for sure.
Can we also see the content of the STAR report? project.Log.final.out
@cziegenhain Here it is:
zUMIs version 0.0.6c
Raw reads: <some number>
Filtered reads: <some number>
Make sure you have approximately 71677 Mb RAM available ..... started STAR run
..... loading genome
..... processing annotations GTF
..... inserting junctions into the genome indices
..... started 1st pass mapping
..... finished 1st pass mapping
..... inserting junctions into the genome indices
..... started mapping
..... finished successfully
[bam_sort_core] merging from 32 files and 8 in-memory blocks...
[bam_sort_core] merging from 24 files and 8 in-memory blocks...
/zUMIs/zUMIs-noslurm.sh: line 112: /home/me/R-3.4.3/bin: is a directory
/zUMIs/zUMIs-noslurm.sh: line 116: /home/me/R-3.4.3/bin: is a directory
I think I should I have given path for R like this /home/me/R-3.4.3/bin/R
I don't want to start zUMIs from beginning, so I should start by adding R and -w Counting inside my above mentioned script? Do I need to do anything else?
p=/path/to/zUMIs
e=/path/to/project
bash $p/zUMIs-master.sh -i $p -V /home/me/R-3.4.3/bin/R -f $e/merged_barcode.fastq.gz -r $e/merged_cDNA.fastq.gz -n project -g /path/to/star4zUMIs -a /path/to/.gtf -c 1-6 -m 1-8 -l 75 -p 8 -b $e/my_zUMIs/barcode_samples.txt -o $e/my_zUMIs/out -w Counting
Please guide.
Thanks.
Yes, that seems to be the problem! It should work to resume the processing using the correct path and -w Counting!
Ok, thanks.
@cziegenhain I did that accordingly and it has been running since yesterday morning. How much time it would take?
zUMIs is usually very fast. However it all depends on the number of reads, your machine configuration and load. Also note that hamming distance operations are computationally costly in case you are using this settings.
Ok. I did not use Hamming distance option (-H).
@cziegenhain Just to update. zUMIs which I started 3 days ago with -w Counting is still running. Although it has created shortcuts for two files under "out" folder (project.aligned.sorted.bam.in of 1 KB and project.aligned.sorted.bam.ex of 1 KB), but still nothing under out/zUMIs/output/stats and out/zUMIs/output/expression. Following are the details in log file which has not been updated since then, but if I try to download above files (.in and .ex, they are quite big):
Your jobs will run on this machine.
Make sure you have more than 31G RAM and 8 processors available.
Your jobs will be started from counting.
You provided these parameters:
SLURM workload manager: no
Summary Stats to produce: yes
Start the pipeline from: counting
A custom mapped BAM: NA
Custom filtered FASTQ: no
Barcode read: $e/merged_barcode.fastq.gz
cDNA read: $e/merged_cDNA.fastq.gz
Study/sample name: project
Output directory: $e/my_zUMIs/out
Cell/sample barcode range: 1-6
UMI barcode range: 1-8
Retain cell with >=N reads: 100
Genome directory: star4zUMIs
GTF annotation file: my.gtf
Number of processors: 8
Read length: 75
Strandedness: 0
Cell barcode Phred: 20
UMI barcode Phred: 20
# bases below phred in CellBC: 1
# bases below phred in UMI: 1
Hamming Distance (UMI): 0
Hamming Distance (CellBC): 0
Plate Barcode Read: NA
Plate Barcode range: NA
Barcodes: $e/my_zUMIs/barcode_samples.txt
zUMIs directory: zUMIs
STAR executable STAR
samtools executable samtools
pigz executable pigz
Rscript executable /home/me/R-3.4.3/bin/R
Additional STAR parameters:
STRT-seq data: no
InDrops data: no
Library read for InDrops: NA
Barcode read2(STRT-seq): NA
Barcode read2 range(STRT-seq): 0-0
Bases(G) to trim(STRT-seq): 3
Subsampling reads: 0
zUMIs version 0.0.6c
ARGUMENT 'zUMIs/zUMIs-dge.R' __ignored__
WARNING: unknown option '--gtf'
ARGUMENT 'my.gtf' __ignored__
WARNING: unknown option '--abam'
ARGUMENT 'out/project.aligned.sorted.bam' __ignored__
WARNING: unknown option '--ubam'
ARGUMENT 'out/project.barcodelist.filtered.sort.sam' __ignored__
WARNING: unknown option '--barcodefile'
ARGUMENT 'barcode_samples.txt' __ignored__
WARNING: unknown option '--out'
ARGUMENT 'out' __ignored__
WARNING: unknown option '--sn'
ARGUMENT 'project' __ignored__
WARNING: unknown option '--cores'
ARGUMENT '8' __ignored__
WARNING: unknown option '--strandedness'
ARGUMENT '0' __ignored__
WARNING: unknown option '--bcstart'
ARGUMENT '1' __ignored__
WARNING: unknown option '--bcend'
ARGUMENT '6' __ignored__
WARNING: unknown option '--umistart'
ARGUMENT '1' __ignored__
WARNING: unknown option '--umiend'
ARGUMENT '8' __ignored__
WARNING: unknown option '--subsamp'
ARGUMENT '0' __ignored__
WARNING: unknown option '--nReadsBC'
ARGUMENT '100' __ignored__
WARNING: unknown option '--hamming'
ARGUMENT '0' __ignored__
WARNING: unknown option '--XCbin'
ARGUMENT '0' __ignored__
R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>
Do you have any idea? Please guide. Thanks.
Oh no, I think you need to give the path to the Rscript executable as such /home/me/R-3.4.3/bin/Rscript
Otherwise it will just start an inactive R instance.
Ok, I will restart from -w Counting. Thanks.
@sdparekh following your first comment on this issue, could you provide an example of the -b option? It would be great if you could provide an example file as well (e.g. is it one sample barcode per line?)
$ zumis -h
/usr/bin/zUMIs-2.9.7/zumis: line 7: curl: command not found
-------------
Good news! A newer version of zUMIs is available at https://github.com/sdparekh/zUMIs
-------------
USAGE: /usr/bin/zUMIs-2.9.7/zumis [options]
-h Print the usage info.
## Required parameters ##
-y <YAML config file> : Path to the YAML config file. Required.
## Program path ##
-d <zUMIs-dir> : Directory containing zUMIs scripts. Default: path to this script.
## Miniconda environment
-c : Use zUMIs dependencies in the preinstalled conda enviroment.
zUMIs version 2.9.7
Thanks!
Hi Marta,
This issue you are replying to is from 2018 and the information probably not so relevant any longer. You can check the documentation in the wiki to find out how to set up your run parameters:
Dear @cziegenhain,
thanks for your quick answer!
I read the documentation but I haven't found (or understood) how to set up the parameters for my analysis, that's how I ended up in this issue.
I have two 10x samples (g001 and g002), each one with two fastq files regarding read 1 and read 2, and two fastq files regarding sample index 1 and 2:
So I have four files per sample. This configuration is named "Dual Index". So, instead of having 4 barcodes per sample as exemplified in the wiki, I only have two (one forward and one reverse). And these are not present in the reads, they are in the separate fastq files.
Thus, I was going to merge the g001-R1 and g002-R1 in one file (the same for the R2) but then I'm not really sure how to specify the sample indexes.
Any help would be appreciated!
My files:
marta@cyanobacteria:/data/merged_fastq$ head UMH-MO-g001_S2_L001_R1_001.fastq
@ST-E00129:1195:HHMK3CCX2:1:1101:1773:1397 1:N:0:NATCAGCCTA+NGGACGAAAC
NCCTTAGAGATGTTAGCTGCTCACTCAG
+
#AAFAJJFFF7FJFJJFJFJ<JJJAJA-
@ST-E00129:1195:HHMK3CCX2:1:1101:1813:1397 1:N:0:NATCAGCCTA+NGGACGAAAC
NGCTTGGTCTAAGCGTGTATCGTCGATC
+
#AAAA<FFJJJFFJFJJJJJJAJF7F<F
@ST-E00129:1195:HHMK3CCX2:1:1101:1834:1397 1:N:0:NATCAGCCTA+NGGACGAAAC
NAGCCAGAGGGCCTCTCGTCCTAAAAAT
marta@cyanobacteria:/data/merged_fastq$ head UMH-MO-g001_S2_L001_R2_001.fastq
@ST-E00129:1195:HHMK3CCX2:1:1101:1773:1397 2:N:0:NATCAGCCTA+NGGACGAAAC
NAGCAACTGGCTCTGGCCCTGGCGGAGAAGTACCGCTAAACTGGAGATAAGCTACTAAACTGTCATCCGAGCATCAAGCCCTCACAGTAT
+
#A---F<F-<--77--A7<JAAJ--7-AJJJJFFAA7JA7AFF-F--<-<-<<-<<7JJ<---7A<A7---7-<-7A-7---7-7A---7
@ST-E00129:1195:HHMK3CCX2:1:1101:1813:1397 2:N:0:NATCAGCCTA+NGGACGAAAC
NGAGGATTAAACCCCAGAATTTCACCTGTCCGCGGACACTTTCCTGAAGCAACTGACATTAGCCGTCGAGGAAAAATACAGCTAAAAAGA
+
#----A----<<-FF-----<<-<--7--7-------7--<<--F--<<-<F-<-----7-----7---77F7--77FA----<FJ----
@ST-E00129:1195:HHMK3CCX2:1:1101:1834:1397 2:N:0:NATCAGCCTA+NGGACGAAAC
NCACCATGAAAGTCCATCATTGGACTCCAGTTCCTGCTCTGTTGTTATTACAATAAAATAAACAGGCAATGAATGATAGAAAAAAAAAAA
marta@cyanobacteria:/data/merged_fastq$ zcat UMH-MO-g001_S2_L001_I1_001.fastq.gz | head
@ST-E00129:1195:HHMK3CCX2:1:1101:1773:1397 1:N:0:NATCAGCCTA+NGGACGAAAC
NATCAGCCTA
+
#-A--A<<A-
@ST-E00129:1195:HHMK3CCX2:1:1101:1813:1397 1:N:0:NATCAGCCTA+NGGACGAAAC
NATCAGCCTA
+
#AA-FAFJJJ
@ST-E00129:1195:HHMK3CCX2:1:1101:1834:1397 1:N:0:NATCAGCCTA+NGGACGAAAC
NATCAGCCTA
marta@cyanobacteria:/data/merged_fastq$ zcat UMH-MO-g001_S2_L001_I2_001.fastq.gz | head
@ST-E00129:1195:HHMK3CCX2:1:1101:1773:1397 2:N:0:NATCAGCCTA+NGGACGAAAC
NGGACGAAAC
+
#A<AF<FJJJ
@ST-E00129:1195:HHMK3CCX2:1:1101:1813:1397 2:N:0:NATCAGCCTA+NGGACGAAAC
NGGACGAAAC
+
#AAFFJJJJJ
@ST-E00129:1195:HHMK3CCX2:1:1101:1834:1397 2:N:0:NATCAGCCTA+NGGACGAAAC
NGGACGAAAC
Hi,
Yes you can just concatenate your fastq files for each R1, R2, I1, I2 (as long as you are sure that the same barcodes did not get reused in the 2nd library).
Best, Christoph
Hi @sdparekh @cziegenhain I have followed https://github.com/sdparekh/zUMIs/wiki/Usage . However, I am wondering if I have multiple fastq.gz files (multiple samples), how would I start? Can I input the folder with multiple fastq files (including transcript fastq and barcode fastq) in the bash script with *? Will it automatically detect from the name that two files (for e.g. 3_R1.fastq.gz for barcode and 3_R2.fastq.gz for transcript belong to sample 3). Please guide.
Thanks.