Closed mdhfz89 closed 3 years ago
Also, I can't seem to pull the docker for zUMIs so I can't test whether this is my installation issue or otherwise. I get this error:
sudo docker pull chrzie/zumis2
Using default tag: latest
Error response from daemon: manifest for chrzie/zumis2:latest not found: manifest unknown:
I guess the problem stems from the UMIstuffFUN.R and data.table but I really am not sure how else to tackle this.
Just an update, I tried to rerun this with the smaller dataset and it does not work. I'm not sure what broke but it errors at the same "Counting" step. With that I tried to rollback the R packages to as close as possible to what was tested as successful in the zUMIs wiki but still got the same error at the same "Counting" step too.
Here's the latest stdout:
You provided these parameters:
YAML file: microsplitTest_ubuntu2.yaml
zUMIs directory: /home/hafiz/tools/zUMIs
STAR executable STAR
samtools executable samtools
pigz executable pigz
Rscript executable Rscript
RAM limit: 28
zUMIs version 2.9.7
Thu Sep 9 10:00:00 +08 2021
WARNING: The STAR version used for mapping is 2.7.9a and the STAR index was created using the version 2.7.4a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.9a.
Filtering...
Thu Sep 9 10:34:08 +08 2021
Warning message:
replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’
[1] "16293 barcodes detected."
[1] "16665515 reads were assigned to barcodes that do not correspond to intact cells."
[1] "Found 126 daughter barcodes that can be binned into 102 parent barcodes."
[1] "Binned barcodes correspond to 15883 reads."
Mapping...
[1] "2021-09-09 10:38:10 +08"
Warning message:
NAs introduced by coercion
STAR --readFilesCommand samtools view -@ 2 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_bsubgenome --sjdbGTFfile /home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/bsub.gtf --runThreadN 10 --sjdbOverhang 73 --readFilesType SAM SE --alignIntronMax 1 --genomeSAindexNbases 10 --twopassMode Basic --readFilesIn /home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAaa.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAab.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAac.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAad.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAae.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAaf.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAag.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAah.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAai.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAaj.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAak.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAal.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAam.filtered.tagged.bam,/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/.tmpMerge//microsplitTest_2SRA.microsplitTest_2SRAan.filtered.tagged.bam --outFileNamePrefix /home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/microsplitTest_2SRA.filtered.tagged.
STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Sep 09 10:38:10 ..... started STAR run
Sep 09 10:38:10 ..... loading genome
Sep 09 10:38:10 ..... processing annotations GTF
Sep 09 10:38:10 ..... inserting junctions into the genome indices
Sep 09 10:38:11 ..... started 1st pass mapping
Sep 09 10:47:17 ..... finished 1st pass mapping
Sep 09 10:47:17 ..... inserting junctions into the genome indices
Sep 09 10:47:19 ..... started mapping
Sep 09 10:59:57 ..... finished mapping
Sep 09 10:59:57 ..... finished successfully
Thu Sep 9 10:59:58 +08 2021
Counting...
Warning message:
replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’
[1] "2021-09-09 11:00:05 +08"
[1] "1.26e+08 Reads per chunk"
[1] "Loading reference annotation from:"
[1] "/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/microsplitTest_2SRA.final_annot.gtf"
[1] "Annotation loaded!"
[1] "Assigning reads to features (ex)"
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 1.32.4
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| S microsplitTest_2SRA.filtered.tagged.Aligne ... ||
|| ||
|| Annotation : R data.frame ||
|| Assignment details : <input_file>.featureCounts.bam ||
|| (Note that files are saved to the output directory) ||
|| ||
|| Dir for temp files : . ||
|| Threads : 12 ||
|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multiple alignments : primary alignment only ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : not required ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\\
|| ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid23271 ... ||
|| Features : 4539 ||
|| Meta-features : 4536 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file microsplitTest_2SRA.filtered.tagged.Aligned.out.bam... ||
|| Single-end reads are included. ||
|| Assign alignments to features... ||
|| Total alignments : 75138166 ||
|| Successfully assigned alignments : 46431651 (61.8%) ||
|| Running time : 0.73 minutes ||
|| ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
[1] "2021-09-09 11:00:58 +08"
[1] "Coordinate sorting final bam file..."
[bam_sort_core] merging from 12 files and 12 in-memory blocks...
[1] "2021-09-09 11:03:54 +08"
[1] "Here are the detected subsampling options:"
[1] "Automatic downsampling"
[1] "Working on barcode chunk 1 out of 1"
[1] "Processing 16293 barcodes in this chunk..."
Error in rbindlist(rsamtools_reads, fill = TRUE, use.names = TRUE) :
Item 1 of input is not a data.frame, data.table or list
Calls: reads2genes_new -> rbindlist
In addition: Warning message:
In mclapply(1:nrow(idxstats), function(x) { :
all scheduled cores encountered errors in user code
Execution halted
Thu Sep 9 11:03:54 +08 2021
Loading required package: yaml
Loading required package: Matrix
[1] "loomR found"
Error in gzfile(file, "rb") : cannot open the connection
Calls: rds_to_loom -> readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file '/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/expression/microsplitTest_2SRA.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Thu Sep 9 11:03:56 +08 2021
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2021-09-09 11:03:56 +08"
Warning message:
replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’
Error in gzfile(file, "rb") : cannot open the connection
Calls: readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file '/home/hafiz/Documents/Hafiz/microSPLiT/reads/03_2SRA_test/03_zUMI_2/zUMIs_output/expression/microsplitTest_2SRA.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Thu Sep 9 11:04:01 +08 2021
Hi,
Sorry for the late answer, this issue slipped my attention. A couple of pointers: 30GB on workstation is potentially a bit tight. Regarding the runs of the server, I do not recommend to set the mem_limit close to the limit of the physical run. Just setting it to ~100 GB should do nicely even for very large datasets.
Does the automatically selected number of barcodes make sense?
I'm guessing from the name of your project that this is bacterial scRNA-seq with microSPLIT? I do not have experience with this, but there can be a lot of unexpected things happening that we did not account for when testing zUMIs. Would you mind sharing a small dataset that reproduces this error message along with the genome reference & gtf?
Best, Christoph
Hi Christoph,
Thanks for replying me. Had to be away from the lab due to Covid restrictions again thus my late email. I have provided 2 kinds of data here. A 1 million reads subset and also the data from 1 flowcell, both worked before but suddenly broke and can't work. I also provided the yaml file from each of the runs that worked before, together with the references and gtf in this dropbox link. I'm not too sure how else is best to send you these.
https://www.dropbox.com/sh/qzxz7012c6b18tr/AADuXMbOyKL50iV1trArx1yEa?dl=0
Thank you so much for checking these out. Also, do you know what is going on with the Docker? I wanted to test that but I can't even get the Docker downloaded.
Best regards, Hafiz
Hi Hafiz,
I will take a look in the coming days.
Hi Hafiz,
I just ran the datasets you had uploaded.
For all tests, I just used zUMIs with the -c conda option.
To run that, I just used the zUMIs conda environments' STAR to generate the index from your fasta file.
~/programs/zUMIs/zUMIs-env/bin/STAR --runMode genomeGenerate --runThreadN 12 --genomeDir bsubgenome_273a --genomeFastaFiles bsub.fasta --genomeSAindexNbases 10 --limitGenomeGenerateRAM 24000000000
I'm attaching the yaml files, where I stayed with the same settings you had used. In the future, I would definitely recommend for you to increase the cutoffs for the filtering of BC and UMI sequences, the defaults you used are very stringent and you will loose a lot of reads. This of course also always depends on the data quality of the sequencing run at hand.
This is the log for the small dataset:
~/programs/zUMIs/zUMIs.sh -c -y microsplitTest_ubuntu_1m.yaml
Warning: YAML file doesn't include 'pigz_exec' option; setting to 'pigz'
Warning: YAML file doesn't include 'STAR_exec' option; setting to 'STAR'
Warning: YAML file doesn't include 'Rscript_exec' option; setting to 'Rscript'
Using miniconda environment for zUMIs!
note: internal executables will be used instead of those specified in the YAML file!
You provided these parameters:
YAML file: microsplitTest_ubuntu_1m.yaml
zUMIs directory: /home/chris/programs/zUMIs
STAR executable STAR
samtools executable samtools
pigz executable pigz
Rscript executable Rscript
RAM limit: 24
zUMIs version 2.9.7
ons 29 sep 2021 16:06:34 CEST
WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a.
Filtering...
ons 29 sep 2021 16:06:39 CEST
[1] "37 barcodes detected."
[1] "6689 reads were assigned to barcodes that do not correspond to intact cells."
[1] "Found 0 daughter barcodes that can be binned into 0 parent barcodes."
[1] "Binned barcodes correspond to 0 reads."
Warning message:
In min(hamming) : no non-missing arguments to min; returning Inf
Mapping...
[1] "2021-09-29 16:06:42 CEST"
Warning message:
NAs introduced by coercion
Sep 29 16:06:42 ..... started STAR run
Sep 29 16:06:42 ..... loading genome
Sep 29 16:06:42 ..... processing annotations GTF
Sep 29 16:06:42 ..... inserting junctions into the genome indices
Sep 29 16:06:42 ..... started 1st pass mapping
Sep 29 16:06:48 ..... finished 1st pass mapping
Sep 29 16:06:48 ..... inserting junctions into the genome indices
Sep 29 16:06:49 ..... started mapping
Sep 29 16:06:57 ..... finished mapping
Sep 29 16:06:57 ..... finished successfully
ons 29 sep 2021 16:06:57 CEST
Counting...
[1] "2021-09-29 16:07:05 CEST"
[1] "1.08e+08 Reads per chunk"
[1] "Loading reference annotation from:"
[1] "/home/chris/projects/zUMIs284/1M_subset/out/microsplitTest_ubuntu.final_annot.gtf"
[1] "Annotation loaded!"
Warning message:
`as_quosure()` requires an explicit environment as of rlang 0.3.0.
Please supply `env`.
This warning is displayed once per session.
[1] "Assigning reads to features (ex)"
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 1.32.4
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| S microsplitTest_ubuntu.filtered.tagged.Alig ... ||
|| ||
|| Annotation : R data.frame ||
|| Assignment details : <input_file>.featureCounts.bam ||
|| (Note that files are saved to the output directory) ||
|| ||
|| Dir for temp files : . ||
|| Threads : 10 ||
|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multiple alignments : primary alignment only ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : not required ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\\
|| ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid122526 ... ||
|| Features : 4539 ||
|| Meta-features : 4536 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file microsplitTest_ubuntu.filtered.tagged.Aligned.out.bam... ||
|| Single-end reads are included. ||
|| Assign alignments to features... ||
|| Total alignments : 449859 ||
|| Successfully assigned alignments : 266544 (59.3%) ||
|| Running time : 0.02 minutes ||
|| ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
[1] "2021-09-29 16:07:13 CEST"
[1] "Coordinate sorting final bam file..."
[bam_sort_core] merging from 0 files and 10 in-memory blocks...
[1] "2021-09-29 16:07:13 CEST"
[1] "Here are the detected subsampling options:"
[1] "Automatic downsampling"
[1] "Working on barcode chunk 1 out of 1"
[1] "Processing 37 barcodes in this chunk..."
[1] "Demultiplexing output bam file by cell barcode..."
[1] "Using python implementation to demultiplex."
[1] "2021-09-29 16:07:15 CEST"
[1] "Demultiplexing zUMIs bam file..."
[1] "Demultiplexing complete."
[1] "2021-09-29 16:07:17 CEST"
[1] "2021-09-29 16:07:17 CEST"
[1] "I am done!! Look what I produced.../home/chris/projects/zUMIs284/1M_subset/out/zUMIs_output/"
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 7290379 389.4 12296361 656.7 9403756 502.3
Vcells 12957074 98.9 29932066 228.4 29809857 227.5
ons 29 sep 2021 16:07:17 CEST
Loading required package: yaml
Loading required package: Matrix
[1] "loomR found"
Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns
This is to maintain compatibility with other loom tools
|======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns
This is to maintain compatibility with other loom tools
|======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns
This is to maintain compatibility with other loom tools
|======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns
This is to maintain compatibility with other loom tools
|======================================================================| 100%ons 29 sep 2021 16:07:20 CEST
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2021-09-29 16:07:20 CEST"
notch went outside hinges. Try setting notch=FALSE.
notch went outside hinges. Try setting notch=FALSE.
[1] "1.08e+08 Reads per chunk"
[1] "Extracting reads from bam file(s)..."
[1] "Working on chunk 1"
Warning message:
In `[.data.table`(data.table::fread(samfile, na.strings = c(""), :
Column 'GEin' does not exist to remove
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 4409029 235.5 8421118 449.8 6999004 373.8
Vcells 7628320 58.2 12531327 95.7 10376106 79.2
ons 29 sep 2021 16:07:29 CEST
So there are two warnings concerning the barcode error correction and the statistics in the end (lack of intron information), but it runs through fine.
This is the log for the full dataset:
~/programs/zUMIs/zUMIs.sh -c -y microsplitTest_ubuntu.yaml
Warning: YAML file doesn't include 'pigz_exec' option; setting to 'pigz'
Warning: YAML file doesn't include 'STAR_exec' option; setting to 'STAR'
Warning: YAML file doesn't include 'Rscript_exec' option; setting to 'Rscript'
Using miniconda environment for zUMIs!
note: internal executables will be used instead of those specified in the YAML file!
You provided these parameters:
YAML file: microsplitTest_ubuntu.yaml
zUMIs directory: /home/chris/programs/zUMIs
STAR executable STAR
samtools executable samtools
pigz executable pigz
Rscript executable Rscript
RAM limit: 24
zUMIs version 2.9.7
ons 29 sep 2021 16:12:07 CEST
WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a.
Filtering...
ons 29 sep 2021 16:23:41 CEST
[1] "14016 barcodes detected."
[1] "15138734 reads were assigned to barcodes that do not correspond to intact cells."
[1] "Found 114 daughter barcodes that can be binned into 93 parent barcodes."
[1] "Binned barcodes correspond to 14502 reads."
Mapping...
[1] "2021-09-29 16:25:48 CEST"
Warning message:
NAs introduced by coercion
Sep 29 16:25:48 ..... started STAR run
Sep 29 16:25:48 ..... loading genome
Sep 29 16:25:48 ..... processing annotations GTF
Sep 29 16:25:48 ..... inserting junctions into the genome indices
Sep 29 16:25:48 ..... started 1st pass mapping
Sep 29 16:33:11 ..... finished 1st pass mapping
Sep 29 16:33:11 ..... inserting junctions into the genome indices
Sep 29 16:33:12 ..... started mapping
Sep 29 16:42:00 ..... finished mapping
Sep 29 16:42:00 ..... finished successfully
ons 29 sep 2021 16:42:01 CEST
Counting...
[1] "2021-09-29 16:42:08 CEST"
[1] "1.08e+08 Reads per chunk"
[1] "Loading reference annotation from:"
[1] "/home/chris/projects/zUMIs284/1FC_set/out/microsplitTest_ubuntu.final_annot.gtf"
[1] "Annotation loaded!"
Warning message:
`as_quosure()` requires an explicit environment as of rlang 0.3.0.
Please supply `env`.
This warning is displayed once per session.
[1] "Assigning reads to features (ex)"
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
Rsubread 1.32.4
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| S microsplitTest_ubuntu.filtered.tagged.Alig ... ||
|| ||
|| Annotation : R data.frame ||
|| Assignment details : <input_file>.featureCounts.bam ||
|| (Note that files are saved to the output directory) ||
|| ||
|| Dir for temp files : . ||
|| Threads : 10 ||
|| Level : meta-feature level ||
|| Paired-end : yes ||
|| Multimapping reads : counted ||
|| Multiple alignments : primary alignment only ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
|| Chimeric reads : not counted ||
|| Both ends mapped : not required ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\\
|| ||
|| Load annotation file .Rsubread_UserProvidedAnnotation_pid130373 ... ||
|| Features : 4539 ||
|| Meta-features : 4536 ||
|| Chromosomes/contigs : 1 ||
|| ||
|| Process BAM file microsplitTest_ubuntu.filtered.tagged.Aligned.out.bam... ||
|| Single-end reads are included. ||
|| Assign alignments to features... ||
|| Total alignments : 68675772 ||
|| Successfully assigned alignments : 40899323 (59.6%) ||
|| Running time : 0.55 minutes ||
|| ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
[1] "2021-09-29 16:42:47 CEST"
[1] "Coordinate sorting final bam file..."
[bam_sort_core] merging from 10 files and 10 in-memory blocks...
[1] "2021-09-29 16:44:35 CEST"
[1] "Here are the detected subsampling options:"
[1] "Automatic downsampling"
[1] "Working on barcode chunk 1 out of 1"
[1] "Processing 14016 barcodes in this chunk..."
[1] "Demultiplexing output bam file by cell barcode..."
[1] "Using python implementation to demultiplex."
[1] "2021-09-29 17:12:45 CEST"
[1] "Breaking up demultiplexing in 16 chunks. This may be because you have >10000 cells or a too low filehandle limit (ulimit -n)."
[1] "Demultiplexing zUMIs bam file..."
[1] "Demultiplexing complete."
[1] "2021-09-29 17:39:29 CEST"
[1] "2021-09-29 17:39:29 CEST"
[1] "I am done!! Look what I produced.../home/chris/projects/zUMIs284/1FC_set/out/zUMIs_output/"
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 8346393 445.8 12296361 656.7 12296361 656.7
Vcells 132291276 1009.4 539891338 4119.1 842345158 6426.6
ons 29 sep 2021 17:39:31 CEST
Loading required package: yaml
Loading required package: Matrix
[1] "loomR found"
Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns
This is to maintain compatibility with other loom tools
|======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns
This is to maintain compatibility with other loom tools
|======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns
This is to maintain compatibility with other loom tools
|======================================================================| 100%Transposing input data: loom file will show input columns (cells) as rows and input rows (genes) as columns
This is to maintain compatibility with other loom tools
|======================================================================| 100%ons 29 sep 2021 17:39:48 CEST
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2021-09-29 17:39:49 CEST"
[1] "1.08e+08 Reads per chunk"
[1] "Extracting reads from bam file(s)..."
[1] "Working on chunk 1"
Warning message:
In `[.data.table`(data.table::fread(samfile, na.strings = c(""), :
Column 'GEin' does not exist to remove
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 4441363 237.2 10963722 585.6 8680035 463.6
Vcells 34957190 266.8 542877470 4141.9 659124247 5028.8
ons 29 sep 2021 17:48:45 CEST
All the tests were run on my local workstation (48 threads / 128 GB RAM) running elementaryOS (ie Ubuntu). Since even the full dataset is really quite small, the RAM requirement is actually quite minimal and you shouldnt have issues with that at all.
Regarding the Docker image, I haven't checked or updated that in a while, but if you would like to use docker it would also just be as simple as creating a new ubuntu image and just running git clone https://github.com/sdparekh/zUMIs.git
Happy to upload the output if its usable for you?
From this I can't really see why you are getting an error, seems like zUMIs should be doing fine.
microsplitTest_ubuntu_1m.yaml.txt full_microsplitTest_ubuntu.yaml.txt
Best, Christoph
Hi Christoph,
I'm embarrassed to say that it did not occur to me that using the "-c" option would solve the issues I'm facing. It seems like I only face this issue using the packages/dependencies that I installed that zUMIs needs. I really wonder how different they are that led to errors I was facing. Thank you for your suggestion and help regarding this!
I understand your point regarding the cutoffs for the BC and UMIs. I will definitely do as you suggest. I'm currently just figuring out an analysis pipeline for microSPLIT that my lab is thinking of doing so was just leaving defaults where possible. I tried running with the less stringent cutoffs you suggested previously (BC_filter: num_bases: 5, phred: 20 and UMI_filter: num_bases: 4, phred: 20) for the full published dataset on my local server and was surprised at just how much more counts I'm getting. Is there a good way to determine a good one to use? Or just try and judge for myself?
Again, thank you for your help in testing and figuring out what worked. Really appreciate it.
Best regards, Hafiz
Hi Hafiz,
No worries. Sometimes the dependencies can be a bit tricky, but glad if it just works with the conda enviroment.
I usually take a look at fastQC plots to decide on the cutoff and taking into account the length of the UMI or BC. So as you say, its a bit of a arbitrary judgement call. The main goal is to discard clearly unusable reads, so if you are in doubt you can always be on the lenient side. I am not sure how microSPLIT works, but if there is an expectation of what BC sequences are valid sequences that always also provides added confidence.
Best, Christoph
Hi Christoph,
That was exactly what I did for my "dephasing" step that I described in my previous query. Before using zUMI, I used cutadapt to filter away paired end reads that did not match a specific list of expected barcodes (BC1) as anchored 3' adapters in the read2 since that was where the barcodes and UMI were for microSPLIT. The structure of the read2 for microSPLIT (from 5' to 3') is:
UMI-spacer-BC3-spacer-BC2-spacer-BC1
That was the reason why I thought I could use the more stringent default cutoffs. But you're right that I should consider less stringent cutoffs as well. Thanks again for your input and help.
Cheers, Hafiz
Hi, I previously faced a problem with getting zUMIs to run but that has been solved. However, I'm facing a problem in getting zUMIs to complete with a bigger dataset. In my initial testing which led me to my previous problem, I used data from 1 of the 2 flowcells downloaded from SRA. In the current attempt, I concatenated the reads together to run them as a larger dataset. Here are the number of reads in the smaller vs the current run.
Smaller test run (1 of 2 flowcell)
Larger run (2 flowcells)
I have checked the larger run for the matching number of reads:
I initially ran the larger set on my own workstation and the run failed multiple times which led me to think that it might be a memory issue since the workstation only has 12 cores and 32GB ram. Therefore, I decided to run zUMIs on a server instead but am still facing the same problem multiple times. I have even progressively increased the available cores and memory like so:
Workstation
Server run 1
Server run 2
Below is the latest yaml file for server run 2
Below is the stdout from the parts that I see an error message:
Workstation runs
Server run 1
Server run 2
Hoping to find any suggestion here that can maybe let me run this to completion. Thanks for your help again.