Closed hmassalha closed 5 years ago
Hi,
I just implemented to run RNA velocity within zUMIs. Please set the option in the YAML file to "velocyto: yes" and rerun 👍 (This assumes that velocyto is installed in your system, otherwise zUMIs will still leave a velocyto-compatible bam file for you)
Let me know how it goes!
Thanks a lot, I will try it and update you. HM
Dear Christoph,
I have a few technical questions with regard to the YAML file.
tee:/home/labs/shalev/hassanm/NGS/All_humans_data_for_velocity/180509_NB551168_0120_AHTKN2BGX5_human5_output/zUMIs_runlog.txt: No such file or directory
, and I know that I don't have the zUMIs_runlog.txt file in my working dir. Is this related to the 'zUMIs_directory' field in the YAML file?mkdir: cannot create directory ‘/home/labs/shalev/hassanm/NGS/All_humans_data_for_velocity/180509_NB551168_0120_AHTKN2BGX5_human5_output\r/zUMIs_output/’: No such file or directory
. I rechecked all my command lines and I don't have such character. Why do I keep getting this? (I am using windows system)Thanks a lot, HM
Hi, I have updates that might save you time answering the above questions. I managed to run the YAML file, and I am no longer facing problem 1,3,4. The \r is a hidden character from windows!
My new question is the following: In the lab we share sequencing runs, meaning my bcl files containing data that is not related to my experiment. could you help me please in who I should run my bcl2fastq command? and where I should add my barcodes in the YAML file? I mean: file1 is read 1 [BC(1-6), UMI(7-16)] file 2 is read 2 [cDNA(1-66)] I am missing the plate barcode which is supposed to be in read 1 if I am not mistaken!
Where I should add the sequences of the plate barcodes? and the well barcodes?
Thanks a lot, HM
Hi HM,
sorry for the slow replies.
Just a quick re to 1.: as you probably saw its the -d
flag when calling zUMIs from the command line. It was necessary to pass this as a shell option unfortunately but zUMIs will always keep your YAML in sync with what you had passed in the shell call.
Thanks for the hint, I fixed that now!
I am not exactly certain from the top of my head how the plate barcode is in MARS-seq. In SCRB-seq, it is the i7 illumina index. To get that, you would run bcl2fastq with the following additional flag: --create-fastq-for-index-reads
then the additional plate-BC fastq file would just be added as file3 in the YAML setup, eg with BC(1-8).
If there are other data on your sequencing runs you could try to create a sample-sheet for the bcl2fastq to get rid of unncessary data that belongs to other people - but for that you would need to know your & their barcodes maybe?
Hope this helps, Christoph
Hi Christoph, Thanks for your reply. I got an error during the run and here is the output that I got:
`Mon Apr 1 11:58:20 IDT 2019 Filtering... Mon Apr 1 14:56:35 IDT 2019 Mapping...
EXITING: FATAL INPUT ERROR: unrecoginzed parameter name "readFilesType" in input "Command-Line-Initial" SOLUTION: use correct parameter name (check the manual)
Apr 01 14:56:36 ...... FATAL ERROR, exiting Mon Apr 1 14:56:36 IDT 2019 Counting... Error in file(filename, "r", encoding = encoding) : cannot open the connection Calls: source -> file In addition: Warning message: In file(filename, "r", encoding = encoding) : cannot open file '/runfeatureCountFUN.R': No such file or directory Execution halted Mon Apr 1 14:56:37 IDT 2019 [1] "2019-04-01 14:56:38 IDT" [1] "Preparing bam file for velocyto..." [E::hts_open_format] Failed to open file /home/labs/shalev/hassanm/NGS/All_humans_data_for_velocity/180326_NB551168_0097_AHKHNKBGX5_human4_output/human4.filtered.tagged.Aligned.out.bam samtools view: failed to open "/home/labs/shalev/hassanm/NGS/All_humans_data_for_velocity/180326_NB551168_0097_AHKHNKBGX5_human4_output/human4.filtered.tagged.Aligned.out.bam" for reading: No such file or directory [1] "2019-04-01 14:56:38 IDT" Error in data.table::fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, : File '/home/labs/shalev/hassanm/NGS/All_humans_data_for_velocity/180326_NB551168_0097_AHKHNKBGX5_human4_output/zUMIs_output/human4kept_barcodes.txt' does not exist. Include one or more spaces to consider the input a system command. Execution halted Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2019-04-01 14:56:38 IDT" Error in file(filename, "r", encoding = encoding) : cannot open the connection Calls: source -> file In addition: Warning message: In file(filename, "r", encoding = encoding) : cannot open file '/statsFUN.R': No such file or directory Execution halted Mon Apr 1 14:56:40 IDT 2019`
I have another suggestion, would it be possible to write output to the '.zUMIs_runlog.txt' file during the run, such as '[1] "I am loading useful packages for plotting..."'? this is important for us as users to see that the zUMIs is working (making something) and don't have a problem with the server or job submission.
Thanks, HM
hey,
this looks like there is still something odd with your path to the zUMIs folder. Please double check!
eg: cannot open file '/runfeatureCountFUN.R': No such file or directory
Thanks for your suggestion to write all verbose into a file additionally, we might add that at a later time
Thanks for the quick reply,
I was missing the 'zUMIs_directory' line in the YAML file. Do you think this is the problem? The path for the zUMIs directory is OK since I got the correct parameters that I provided in the '.zUMIs_runlog.txt' file. I just rerun it and I will update you of what I will get,
Thanks, HM
Hi Christoph, Adding the 'zUMIs_directory' lines in the YAML file did help, however, I still facing another problem. Here is the output that I got:
EXITING: FATAL INPUT ERROR: unrecoginzed parameter name "readFilesType" in input "Command-Line-Initial"
SOLUTION: use correct parameter name (check the manual)
Apr 01 22:26:30 ...... FATAL ERROR, exiting
Mon Apr 1 22:26:30 IDT 2019
Counting...
[1] "2019-04-01 22:26:39 IDT"
Read 35.8% of 6164808 rows
Read 51.4% of 6164808 rows
Read 85.5% of 6164808 rows
Read 6164808 rows and 2 (of 2) columns from 0.082 GB file in 00:00:05
[1] "96520 barcodes detected."
Error in loadNamespace(name) : there is no package called ‘ggrastr’
Calls: cellBC ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted
Mon Apr 1 22:28:23 IDT 2019
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2019-04-01 22:28:23 IDT"
Error in .guessPackageName(env) :
none of the objects in the source code could be found: need to attach or specify the package
Calls: suppressMessages ... withCallingHandlers -> insertSource -> .guessPackageName
Execution halted
Mon Apr 1 22:28:24 IDT 2019
Any suggestion...
Thanks a lot, HM
That reads like your star version is too old. Which one are you running? You are also lacking an R package further down, so please check carefully here: https://github.com/sdparekh/zUMIs/wiki/Installation
Hi, I am working with STAR 2.5.3a
I ran again the pipeline after updating the packages of R and I get the following errors:
` CPU time : 50748.70 sec. Max Memory : 1349 MB Average Memory : 195.26 MB Total Requested Memory : 72000.00 MB Delta Memory : 70651.00 MB Max Swap : - Max Processes : 48 Max Threads : 103 Run time : 5696 sec. Turnaround time : 5699 sec.
The output (if any) follows:
Good news! A newer version of zUMIs is available at https://github.com/sdparekh/zUMIs
You provided these parameters: YAML file: /home/labs/shalev/hassanm/.Documents/temp_OD/jobs/YAML_win.yaml zUMIs directory: /home/labs/shalev/hassanm/NGS/zUMI/zUMIs_2_4_0/zUMIs-master STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: null zUMIs version 2.4.0e
Fri Apr 5 08:08:35 IDT 2019 Filtering... Fri Apr 5 09:41:50 IDT 2019 Mapping...
EXITING: FATAL INPUT ERROR: unrecoginzed parameter name "readFilesType" in input "Command-Line-Initial" SOLUTION: use correct parameter name (check the manual)
Apr 05 09:41:52 ...... FATAL ERROR, exiting Fri Apr 5 09:41:52 IDT 2019 Counting... [1] "2019-04-05 09:42:01 IDT" [1] "Warning! None of the annotated barcodes were detected." [1] "Continuing with top 100 barcodes instead..." [1] "238710457 reads were assigned to barcodes that do not correspond to intact cells." [1] "Inf Reads per chunk" [1] "Loading reference annotation from:" [1] "/home/labs/shalev/hassanm/NGS/All_humans_data_for_velocity/180326_NB551168_0097_AHKHNKBGX5_human4_output/human4.final_annot.gtf" [1] "Annotation loaded!" [1] "Assigning reads to features (ex)"
ERROR: invalid parameter: '/home/labs/shalev/hassanm/NGS/All_humans_data_for_velocity/180326_NB551168_0097_AHKHNKBGX5_human4_output/human4.filtered.tagged.Aligned.out.bam'
Error in file(file, "rt") : cannot open the connection
Calls: .runFeatureCount ->
I see a few things from this report:
Thanks a lot, HM
Hi,
Your star version is simply too old. https://github.com/sdparekh/zUMIs/wiki/Installation
For the barcode error, yes the barcode strings you give (one BC per line) needs to be the concatenated plateBC+cellBC. https://github.com/sdparekh/zUMIs/wiki/Barcodes
Hi, I just got a full run. and here is the output
Successfully completed.
Resource usage summary:
CPU time : 121812.00 sec.
Max Memory : 70935 MB
Average Memory : 18483.89 MB
Total Requested Memory : 150000.00 MB
Delta Memory : 79065.00 MB
Max Swap : -
Max Processes : 117
Max Threads : 289
Run time : 10718 sec.
Turnaround time : 10722 sec.
The output (if any) follows:
Good news! A newer version of zUMIs is available at https://github.com/sdparekh/zUMIs
You provided these parameters: YAML file: /home/labs/shalev/hassanm/.Documents/temp_OD/jobs/YAML_win.yaml zUMIs directory: /home/labs/shalev/hassanm/NGS/zUMI/zUMIs_2_4_0/zUMIs-master STAR executable /home/labs/shalev/hassanm/NGS/zUMI/STAR_2_7/STAR/bin/Linux_x86_64_static/STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: null zUMIs version 2.4.0e
Mon Apr 8 08:44:47 IDT 2019 Filtering... Mon Apr 8 09:39:06 IDT 2019 Mapping... Apr 08 09:39:08 ..... started STAR run Apr 08 09:39:09 ..... loading genome Apr 08 09:39:23 ..... processing annotations GTF Apr 08 09:39:35 ..... inserting junctions into the genome indices Apr 08 09:41:28 ..... started 1st pass mapping Apr 08 10:00:37 ..... finished 1st pass mapping Apr 08 10:00:38 ..... inserting junctions into the genome indices Apr 08 10:02:00 ..... started mapping Apr 08 10:21:41 ..... finished mapping Apr 08 10:21:41 ..... finished successfully Mon Apr 8 10:21:41 IDT 2019 Counting... [1] "2019-04-08 10:21:51 IDT" [1] "36399086 reads were assigned to barcodes that do not correspond to intact cells." [1] "Inf Reads per chunk" [1] "Loading reference annotation from:" [1] "/home/labs/shalev/hassanm/NGS/All_humans_data_for_velocity/180326_NB551168_0097_AHKHNKBGX5_human4_output/human4.final_annot.gtf" [1] "Annotation loaded!" [1] "Assigning reads to features (ex)"
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 1.28.1
//========================== featureCounts setting ===========================\ | Input files : 1 BAM file | S /home/labs/shalev/hassanm/NGS/All_humans_d ... | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Dir for temp files : . | ||||||||||
Threads : 19 | ||||||||||
Level : meta-feature level | ||||||||||
Paired-end : yes | ||||||||||
Strand specific : stranded | ||||||||||
Multimapping reads : primary only | ||||||||||
Multi-overlapping reads : not counted | ||||||||||
Min overlapping bases : 1 | ||||||||||
Chimeric reads : not counted | ||||||||||
Both ends mapped : not required | ||||||||||
\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\ | Load annotation file ./.Rsubread_UserProvidedAnnotation_pid33992 ... | Features : 576330 | Meta-features : 49614 | Chromosomes/contigs : 47 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Process BAM file /home/labs/shalev/hassanm/NGS/All_humans_data_for_vel ... | ||||||||||||||||||
Single-end reads are included. | ||||||||||||||||||
Assign reads to features... | ||||||||||||||||||
Total reads : 282130265 | ||||||||||||||||||
Successfully assigned reads : 132509211 (47.0%) | ||||||||||||||||||
Running time : 9.06 minutes | ||||||||||||||||||
Read assignment finished. | ||||||||||||||||||
\===================== http://subread.sourceforge.net/ ======================//
[1] "Assigning reads to features (in)"
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 1.28.1
//========================== featureCounts setting ===========================\ | Input files : 1 BAM file | S /home/labs/shalev/hassanm/NGS/All_humans_d ... | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Dir for temp files : . | ||||||||||
Threads : 19 | ||||||||||
Level : meta-feature level | ||||||||||
Paired-end : yes | ||||||||||
Strand specific : stranded | ||||||||||
Multimapping reads : primary only | ||||||||||
Multi-overlapping reads : not counted | ||||||||||
Min overlapping bases : 1 | ||||||||||
Chimeric reads : not counted | ||||||||||
Both ends mapped : not required | ||||||||||
\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\ | Load annotation file ./.Rsubread_UserProvidedAnnotation_pid33992 ... | Features : 275949 | Meta-features : 28257 | Chromosomes/contigs : 36 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Process BAM file /home/labs/shalev/hassanm/NGS/All_humans_data_for_vel ... | ||||||||||||||||||
Single-end reads are included. | ||||||||||||||||||
Assign reads to features... | ||||||||||||||||||
Total reads : 282130265 | ||||||||||||||||||
Successfully assigned reads : 27069971 (9.6%) | ||||||||||||||||||
Running time : 9.02 minutes | ||||||||||||||||||
Read assignment finished. | ||||||||||||||||||
\===================== http://subread.sourceforge.net/ ======================//
[bam_sort_core] merging from 20 files and 10 in-memory blocks... [bam_sort_core] merging from 20 files and 10 in-memory blocks... [1] "Here are the detected subsampling options:" [1] "Automatic downsampling" [1] "Working on barcode chunk 1 out of 1" [1] "Processing 9020 barcodes in this chunk..." [1] "2019-04-08 11:31:50 IDT" [1] "I am done!! Look what I produced.../home/labs/shalev/hassanm/NGS/All_humans_data_for_velocity/180326_NB551168_0097_AHKHNKBGX5_human4_output/zUMIs_output/" used (Mb) gc trigger (Mb) max used (Mb) Ncells 4839517 258.5 9968622 532.4 8273852 441.9 Vcells 1005052183 7668.0 3101409291 23661.9 3876761302 29577.4 Mon Apr 8 11:31:52 IDT 2019 [1] "2019-04-08 11:31:52 IDT" [1] "Preparing bam file for velocyto..." [bam_sort_core] merging from 0 files and 19 in-memory blocks... [1] "2019-04-08 11:43:21 IDT" [1] "Attempting to run RNA velocity..." which: no velocyto in (/usr/share/lsf/10.1/linux3.10-glibc2.17-x86_64/bin:/apps/RH7U2/gnu/R/3.4.3/bin:/apps/RH7U2/general/zUMIs/2018-03-04/:/apps/RH7U2/gnu/STAR/2.5.2b/bin:/apps/RH7U2/gnu/mysql/5.7.16/bin:/apps/RH7U2/intel/pigz/2.3.4/bin:/apps/RH7U2/gnu/samtools/1.9/bin:/apps/RH7U2/gnu/bzip2/1.0.6/bin:/apps/RH7U2/gnu/star/2.5.3a/bin:/apps/RH7U2/gnu/R/3.5.0/bin:/apps/RH7U2/general/zUMIs/2019-04-02/:/apps/RH7U2/gnu/bcl2fastq/2.20.0.422/bin:/apps/RH7U2/general/cuda/7.5/bin:/apps/RH7U2/general/matlab/R2018a/bin/glnxa64:/apps/RH7U2/general/matlab/R2018a/bin:/usr/share/lsf/10.1/linux3.10-glibc2.17-x86_64/etc:/usr/lpp/mmfs/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/usr/lpp/mmfs/bin:/home/labs/shalev/hassanm/bin) [1] "No velocyto installation found in path. Please install it via pip." [1] "2019-04-08 11:43:22 IDT" Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2019-04-08 11:43:22 IDT" Error in .guessPackageName(env) : none of the objects in the source code could be found: need to attach or specify the package Calls: suppressMessages ... withCallingHandlers -> insertSource -> .guessPackageName Execution halted Mon Apr 8 11:43:23 IDT 2019
It seems that the run went ok, thanks for the help. One small question regarding the barcodes. I have 2 barcodes (plate and cells barcodes) one on each read. Since the yaml file should have a path for merged barcodes file, Does the order of the file1 [BC(1-4)] and file2 [BC(1-7)] should be the same order of the concatenating barcodes file [BC(1-4) BC(1-7)]?
Thanks a lot, HM
Looks like zUMIs is doing what it should so I am closing this issue! Yes, the order of the concatenated barcode strings and the order of barcode ranges given in the file setup matter and should match!
Best Christoph
Hi all, We are trying to run velocity using the BAM files that we got from the zUMIs pipeline. One of the requests of velocity is one BAM file the is including data for exons and introns. We are wondering if we can use the BAM files that are produced through the zUMIs pipeline. We have 3 BAM files:
What is the difference between the three files? How we can merge exons data with introns data in one BAM file? Any additional tip on how to use the zUMIs output for velocity is appreciated.
Thanks a lot, HM