sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
271 stars 67 forks source link

could not find function "read_yaml" #291

Closed brain-discourse closed 2 years ago

brain-discourse commented 2 years ago

Hi, I am using zUMIs to analyze smartseq3 data that was run on miseq. I installed the newest version of zUMIs from github and also have all the packages for r/3.6 including yaml in my home directory. for some reason the software is unable to load yaml or finding the activate file. I would be grateful if you could help me troubleshoot this:

Here is what my input script looks like: #!/bin/bash module load anaconda module load samtools module load star module load r/3.6.0 module load python echo 'R_LIBS_USER="~/R/libs"' > $HOME/.Renviron module load pigz scrnaseq/zUMIs/zUMIs.sh -c -y scrnaseq/smartseq_pilot.yaml

Here is my output file: `Using miniconda environment for zUMIs! note: internal executables will be used instead of those specified in the YAML file! scrnaseq/zUMIs/zUMIs.sh: line 161: scrnaseq/zUMIs/zUMIs-env/bin/activate: No such file or directory scrnaseq/zUMIs/zUMIs.sh: line 162: conda-unpack: command not found Failed with error: ‘cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer’ Error in withCallingHandlers(expr, message = function(c) invokeRestart("muffleMessage")) : Library 'yaml' is needed by R; please install it Calls: suppressMessages -> withCallingHandlers Execution halted

You provided these parameters: YAML file: scrnaseq/smartseq_pilot.yaml zUMIs directory: scrnaseq/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 0 zUMIs version 2.9.7

Sun Nov 7 19:19:12 EST 2021 WARNING: The STAR version used for mapping is 2.7.5a and the STAR index was created using the version 2.7.4a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.5a. Filtering... Failed with error: ‘cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer’ Error: could not find function "read_yaml" Execution halted Failed with error: ‘cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer’ Error: could not find function "read_yaml" Execution halted Failed with error: ‘cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer’ Error: could not find function "read_yaml" Execution halted Failed with error: ‘cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer’ Error: could not find function "read_yaml" Execution halted Failed with error: ‘cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer’ Error: could not find function "read_yaml" Execution halted Failed with error: ‘cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer’ Error: could not find function "read_yaml" Execution halted Failed with error: ‘cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer’ Error: could not find function "read_yaml" Execution halted sh: /zUMIs_output/.tmpMerge/.Smartseq3_pilot_miseqrunag.filtered.tagged.bam: No such file or directory sh: /zUMIs_output/.tmpMerge/.Smartseq3_pilot_miseqrunae.filtered.tagged.bam: No such file or directory sh: /zUMIs_output/.tmpMerge/.Smartseq3_pilot_miseqrunaf.filtered.tagged.bam: No such file or directory sh: /zUMIs_output/.tmpMerge/.Smartseq3_pilot_miseqrunac.filtered.tagged.bam: No such file or directory sh: /zUMIs_output/.tmpMerge/.Smartseq3_pilot_miseqrunab.filtered.tagged.bam: No such file or directory sh: /zUMIs_output/.tmpMerge/.Smartseq3_pilot_miseqrunaa.filtered.tagged.bam: No such file or directory sh: /zUMIs_output/.tmpMerge/.Smartseq3_pilot_miseqrunad.filtered.tagged.bam: No such file or directory ls: cannot access scrnaseq//zUMIs_output/.tmpMerge//Smartseq3_pilot_miseqrun..filtered.tagged.bam: No such file or directory cat: scrnaseq//zUMIs_output/.tmpMerge//Smartseq3_pilot_miseqrun..BCstats.txt: No such file or directory [main_samview] fail to read the header from "-". Sun Nov 7 19:20:48 EST 2021 Error in readRDS(pfile) : cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer Calls: library -> find.package -> lapply -> FUN -> readRDS Execution halted Mapping... Failed with error: ‘cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer’ Failed with error: ‘cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer’ [1] "2021-11-07 19:20:48 EST" Error in readRDS(pfile) : cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer Calls: :: ... tryCatch -> tryCatchList -> tryCatchOne -> Execution halted Sun Nov 7 19:20:48 EST 2021 Counting... Error in readRDS(pfile) : cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer Calls: library -> find.package -> lapply -> FUN -> readRDS Execution halted Sun Nov 7 19:20:49 EST 2021 Loading required package: yaml Failed with error: ‘cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer’ Loading required package: Matrix Failed with error: ‘cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer’ Error: could not find function "read_yaml" Execution halted Sun Nov 7 19:20:49 EST 2021 Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2021-11-07 19:20:49 EST" Error in readRDS(pfile) : cannot read workspace version 3 written by R 3.6.0; need R 3.5.0 or newer Calls: library -> find.package -> lapply -> FUN -> readRDS Execution halted Sun Nov 7 19:20:49 EST 2021 scrnaseq/zUMIs/zUMIs.sh: line 323: scrnaseq/zUMIs/zUMIs-env/bin/deactivate: No such file or directory`

cziegenhain commented 2 years ago

Hi,

Seems like there is an issue with the packed conda environment. I recommend to deleted your zUMIs folder and clone fresh from GitHub. On the first call to zUMIs, the archive for the conda will be unpacked, be sure not to abort while that is running (may take a few minutes).

Best, C

brain-discourse commented 2 years ago

Hi, Thank you for your response. Reinstalling it fixed the error. However, I am running into another issue with the script:

`You provided these parameters:
 YAML file: scrnaseq/smartseq_pilot.yaml
 zUMIs directory:       scrnaseq/zUMIs
 STAR executable        STAR
 samtools executable        samtools
 pigz executable        pigz
 Rscript executable     Rscript
 RAM limit:   0
 zUMIs version 2.9.7 

Wed Nov 10 15:30:32 EST 2021
WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a.
Filtering...
Wed Nov 10 15:35:12 EST 2021
Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang
[1] "13 barcodes detected."
[1] "15142 reads were assigned to barcodes that do not correspond to intact cells."
Mapping...
[1] "2021-11-10 15:35:17 EST"
Nov 10 15:35:21 ..... started STAR run
Nov 10 15:35:21 ..... loading genome

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=49 is not equal to the value at the genome generation step =0
SOLUTION: 

Nov 10 15:35:21 ...... FATAL ERROR, exiting
Nov 10 15:35:21 ..... started STAR run
Nov 10 15:35:21 ..... loading genome

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=49 is not equal to the value at the genome generation step =0
SOLUTION: 

Nov 10 15:35:21 ...... FATAL ERROR, exiting
Nov 10 15:35:21 ..... started STAR run
Nov 10 15:35:22 ..... loading genome

EXITING because of fatal PARAMETERS error: present --sjdbOverhang=49 is not equal to the value at the genome generation step =0
SOLUTION: 

Nov 10 15:35:22 ...... FATAL ERROR, exiting
[main_cat] ERROR: input is not BAM or CRAM
[main_cat] ERROR: input is not BAM or CRAM
Wed Nov 10 15:35:22 EST 2021
Counting...
Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang
[1] "2021-11-10 15:35:34 EST"
[1] "4.5e+08 Reads per chunk"
[1] "Loading reference annotation from:"
[1] "scrnaseq//Smartseq3_pilot_miseqrun.final_annot.gtf"
[E::hts_open_format] Failed to open file scrnaseq//Smartseq3_pilot_miseqrun.filtered.tagged.Aligned.out.bam`

Now, I don't know what might be causing this issue. When I am running the program I am specifically loading STAR/2.7.1 and my STAR index file also doesn't contain any overhangs. Another thing that is bothering me is the number of barcodes detected when my multiplexed R1 and R2 fastq files have 7632992 lines with the UMI present in 924846 lines.

Here is my script of zUMIs `#!/bin/bash

SBATCH -t 07-00:00:00

SBATCH --mem=40g

module load anaconda/4.3.0 module load samtools module load star/2.7.1a module load r/3.6.0 module load python echo 'R_LIBS_USER="~/R/libs"' > $HOME/.Renviron module load pigz scrnaseq/zUMIs/zUMIs.sh -c -y scrnaseq/smartseq_pilot.yaml `

here is the genome parameters output for my STARindex file: `### STAR --runMode genomeGenerate --runThreadN 24 --genomeDir scrnaseq/hg38_STAR5idx1 --genomeFastaFiles scrnaseq/Homo_sapiens.GRCh38.dna.primary_assembly.fa

GstrandBit 32

versionGenome 2.7.1a genomeFastaFiles scrnaseq/Homo_sapiens.GRCh38.dna.primary_assembly.fa genomeSAindexNbases 14 genomeChrBinNbits 18 genomeSAsparseD 1 sjdbOverhang 0 sjdbFileChrStartEnd - sjdbGTFfile - sjdbGTFchrPrefix - sjdbGTFfeatureExon exon sjdbGTFtagExonParentTranscript transcript_id sjdbGTFtagExonParentGene gene_id sjdbInsertSave Basic genomeFileSizes 3138387968 24303254806`

cziegenhain commented 2 years ago

Hi,

As you see there is an issue with the STAR index. I recommend remaking the STAR index with the STAR binary also used while mapping (v2.7.3a); you can find it in zUMIs/zUMIs-env/bin/STAR

For reference, this is how my STAR genomeParameters.txt file looks:

### STAR   --runMode genomeGenerate   --runThreadN 48   --genomeDir STAR7idx_primary_noGTF   --genomeFastaFiles hg38.primary_assembly.sorted.fa   
### GstrandBit 32
versionGenome   2.7.1a
genomeFastaFiles    hg38.primary_assembly.sorted.fa 
genomeSAindexNbases 14
genomeChrBinNbits   18
genomeSAsparseD 1
sjdbOverhang    0
sjdbFileChrStartEnd - 
sjdbGTFfile -
sjdbGTFchrPrefix    -
sjdbGTFfeatureExon  exon
sjdbGTFtagExonParentTranscript  transcript_id
sjdbGTFtagExonParentGene    gene_id
sjdbInsertSave  Basic
genomeFileSizes 3138387968 24303254806 

I don't know why the sjdbOverhang parameter even occurs in your file, must be some specific issue in STAR with the version you have at hand. Just a note additionally: I have noticed that the versionGenome parameter is not always fully in sync with the actual STAR version so it can be quite common to see zUMIs warning about it - should work out most of the time.

Best, Christoph

brain-discourse commented 2 years ago

Thank you for your response. You were right, it might have had something to do with this STAR version. Switching it to 2.7.3a fixed it. However the number of barcodes detected when my multiplexed R1 and R2 fastq files have 7632992 lines with the UMI present in 924846 lines is still a concern. Also, I am encountering an "Error in eval(bysub, x, parent.frame()) : object 'readcount_internal' not found". I would be grateful if you could help me navigate this issue.

`Using miniconda environment for zUMIs! note: internal executables will be used instead of those specified in the YAML file!

You provided these parameters: YAML file: scrnaseq/smartseq_pilot.yaml zUMIs directory: scrnaseq/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 0 zUMIs version 2.9.7

Thu Nov 11 09:35:46 EST 2021 WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a. Filtering... Thu Nov 11 09:38:46 EST 2021 Registered S3 methods overwritten by 'ggplot2': method from [.quosures rlang c.quosures rlang print.quosures rlang [1] "13 barcodes detected." [1] "15142 reads were assigned to barcodes that do not correspond to intact cells." Mapping... [1] "2021-11-11 09:38:51 EST" Nov 11 09:38:58 ..... started STAR run Nov 11 09:38:58 ..... loading genome Nov 11 09:38:58 ..... started STAR run Nov 11 09:38:58 ..... loading genome Nov 11 09:38:58 ..... started STAR run Nov 11 09:38:58 ..... loading genome Nov 11 09:40:33 ..... processing annotations GTF Nov 11 09:41:05 ..... inserting junctions into the genome indices Nov 11 09:41:40 ..... processing annotations GTF Nov 11 09:41:53 ..... processing annotations GTF Nov 11 09:42:42 ..... inserting junctions into the genome indices Nov 11 09:42:56 ..... inserting junctions into the genome indices Nov 11 10:02:45 ..... started 1st pass mapping Nov 11 10:02:58 ..... finished 1st pass mapping Nov 11 10:02:59 ..... inserting junctions into the genome indices Nov 11 10:06:05 ..... started 1st pass mapping Nov 11 10:06:50 ..... started 1st pass mapping Nov 11 10:07:17 ..... finished 1st pass mapping Nov 11 10:07:20 ..... inserting junctions into the genome indices Nov 11 10:07:58 ..... finished 1st pass mapping Nov 11 10:08:01 ..... inserting junctions into the genome indices Nov 11 10:10:01 ..... started mapping Nov 11 10:10:26 ..... finished mapping Nov 11 10:10:35 ..... finished successfully Nov 11 10:14:07 ..... started mapping Nov 11 10:14:46 ..... started mapping Nov 11 10:15:08 ..... finished mapping Nov 11 10:15:18 ..... finished successfully Nov 11 10:15:27 ..... finished mapping Nov 11 10:15:29 ..... finished successfully Thu Nov 11 10:15:31 EST 2021 Counting... Registered S3 methods overwritten by 'ggplot2': method from [.quosures rlang c.quosures rlang print.quosures rlang [1] "2021-11-11 10:15:41 EST" [1] "4.5e+08 Reads per chunk" [1] "Loading reference annotation from:" [1] "scrnaseq//Smartseq3_pilot_miseqrun.final_annot.gtf" [1] "Annotation loaded!" Warning message: as_quosure() requires an explicit environment as of rlang 0.3.0. Please supply env. This warning is displayed once per session. [1] "Assigning reads to features (ex)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4
//========================== featureCounts setting ===========================\ Input files : 1 BAM file P Smartseq3_pilot_miseqrun.filtered.tagged.A ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)
Dir for temp files : .
Threads : 8
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1
Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file .Rsubread_UserProvidedAnnotation_pid8958 ... Features : 349261 Meta-features : 60676 Chromosomes/contigs : 47
Process BAM file Smartseq3_pilot_miseqrun.filtered.tagged.Aligned.out. ...
Paired-end reads are included.
Assign alignments (paired-end) to features...
Total alignments : 391934
Successfully assigned alignments : 6457 (1.6%)
Running time : 0.06 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "Assigning reads to features (in)"

    ==========     _____ _    _ ____  _____  ______          _____  
    =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
      =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
        ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
          ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
    ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
   Rsubread 1.32.4
//========================== featureCounts setting ===========================\ Input files : 1 BAM file P Smartseq3_pilot_miseqrun.filtered.tagged.A ...
Annotation : R data.frame
Assignment details : .featureCounts.bam
(Note that files are saved to the output directory)
Dir for temp files : .
Threads : 8
Level : meta-feature level
Paired-end : yes
Multimapping reads : counted
Multiple alignments : primary alignment only
Multi-overlapping reads : not counted
Min overlapping bases : 1
Chimeric reads : not counted
Both ends mapped : not required

\===================== http://subread.sourceforge.net/ ======================//

//================================= Running ==================================\ Load annotation file .Rsubread_UserProvidedAnnotation_pid8958 ... Features : 240693 Meta-features : 28409 Chromosomes/contigs : 34
Process BAM file Smartseq3_pilot_miseqrun.filtered.tagged.Aligned.out. ...
Paired-end reads are included.
Assign alignments (paired-end) to features...
Total alignments : 391934
Successfully assigned alignments : 11996 (3.1%)
Running time : 0.06 minutes

\===================== http://subread.sourceforge.net/ ======================//

[1] "2021-11-11 10:24:29 EST" [1] "Coordinate sorting final bam file..." [bam_sort_core] merging from 0 files and 8 in-memory blocks... [1] "2021-11-11 10:24:36 EST" [1] "Here are the detected subsampling options:" [1] "Automatic downsampling" [1] "Working on barcode chunk 1 out of 1" [1] "Processing 13 barcodes in this chunk..." Error in eval(bysub, x, parent.frame()) : object 'readcount_internal' not found Calls: convert2countM -> .makewide -> [ -> [.data.table -> eval -> eval Execution halted Thu Nov 11 10:24:51 EST 2021 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Warning! HDF5 library version mismatched error The HDF5 header files used to compile this application do not match the version used by the HDF5 library to which this application is linked. Data corruption or segmentation faults may occur if the application continues. This can happen when an application was compiled by one version of HDF5 but linked with a different version of static or shared HDF5 library. You should recompile the application or check your shared library related settings such as 'LD_LIBRARY_PATH'. You can, at your own risk, disable this warning by setting the environment variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'. Setting it to 2 or higher will suppress the warning messages totally. Headers are 1.10.4, library is 1.10.5 SUMMARY OF THE HDF5 CONFIGURATION

General Information:

               HDF5 Version: 1.10.5
              Configured on: Tue Oct 22 12:02:13 UTC 2019
              Configured by: conda@16247e67ecd5
                Host system: x86_64-conda_cos6-linux-gnu
          Uname information: Linux 16247e67ecd5 4.15.0-1059-azure #64-Ubuntu SMP Fri Sep 13 17:02:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
                   Byte sex: little-endian
         Installation point: scrnaseq/zUMIs/zUMIs-env

Compiling Options:

                 Build Mode: production
          Debugging Symbols: no
                    Asserts: no
                  Profiling: no
         Optimization Level: high

Linking Options:

                  Libraries: static, shared

Statically Linked Executables: LDFLAGS: -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/scrnaseq/zUMIs/zUMIs-env/lib -Wl,-rpath-link,scrnaseq/zUMIs/zUMIs-env/lib -L/scrnaseq/zUMIs/zUMIs-env/lib H5_LDFLAGS: AM_LDFLAGS: -L/scrnaseq/zUMIs/zUMIs-env/lib Extra libraries: -lrt -lpthread -lz -ldl -lm Archiver: /home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/_build_env/bin/x86_64-conda_cos6-linux-gnu-ar AR_FLAGS: cr Ranlib: /home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/_build_env/bin/x86_64-conda_cos6-linux-gnu-ranlib

Languages:

                          C: yes
                 C Compiler: /home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc
                   CPPFLAGS: -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -I/scrnaseq/zUMIs/zUMIs-env/include
                H5_CPPFLAGS: -D_GNU_SOURCE -D_POSIX_C_SOURCE=200809L   -DNDEBUG -UH5_DEBUG_API
                AM_CPPFLAGS:  -I/scrnaseq/zUMIs/zUMIs-env/include
                    C Flags: -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -I/scrnaseq/zUMIs/zUMIs-env/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/work=/usr/local/src/conda/hdf5_split-1.10.5 -fdebug-prefix-map=/scrnaseq/zUMIs/zUMIs-env=/usr/local/src/conda-prefix
                 H5 C Flags:  -std=c99  -pedantic -Wall -Wextra -Wbad-function-cast -Wc++-compat -Wcast-align -Wcast-qual -Wconversion -Wdeclaration-after-statement -Wdisabled-optimization -Wfloat-equal -Wformat=2 -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-include-dirs -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpacked -Wpointer-arith -Wredundant-decls -Wshadow -Wstrict-prototypes -Wswitch-default -Wswitch-enum -Wundef -Wunused-macros -Wunsafe-loop-optimizations -Wwrite-strings -finline-functions -s -Wno-inline -Wno-aggregate-return -Wno-missing-format-attribute -Wno-missing-noreturn -O
                 AM C Flags: 
           Shared C Library: yes
           Static C Library: yes

                    Fortran: yes
           Fortran Compiler: /home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/_build_env/bin/x86_64-conda_cos6-linux-gnu-gfortran
              Fortran Flags: 
           H5 Fortran Flags:  -pedantic -Wall -Wextra -Wunderflow -Wimplicit-interface -Wsurprising -Wno-c-binding-type  -s -O2
           AM Fortran Flags: 
     Shared Fortran Library: yes
     Static Fortran Library: yes

                        C++: yes
               C++ Compiler: /home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/_build_env/bin/x86_64-conda_cos6-linux-gnu-c++
                  C++ Flags: -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -I/scrnaseq/zUMIs/zUMIs-env/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/work=/usr/local/src/conda/hdf5_split-1.10.5 -fdebug-prefix-map=/scrnaseq/zUMIs/zUMIs-env=/usr/local/src/conda-prefix
               H5 C++ Flags:   -pedantic -Wall -W -Wundef -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Wredundant-decls -Winline -Wsign-promo -Woverloaded-virtual -Wold-style-cast -Weffc++ -Wreorder -Wnon-virtual-dtor -Wctor-dtor-privacy -Wabi -finline-functions -s -O
               AM C++ Flags: 
         Shared C++ Library: yes
         Static C++ Library: yes

                       Java: no

Features:

               Parallel HDF5: no

Parallel Filtered Dataset Writes: no Large Parallel I/O: no High-level library: yes Threadsafety: yes Default API mapping: v110 With deprecated public symbols: yes I/O filters (external): deflate(zlib) MPE: no Direct VFD: no dmalloc: no Packages w/ extra debug output: none API tracing: no Using memory checker: yes Memory allocation sanity checks: no Function stack tracing: no Strict file format checks: no Optimization instrumentation: no Bye... /scrnaseq/zUMIs/zUMIs.sh: line 307: 10462 Aborted ${Rexc} ${zumisdir}/misc/rds2loom.R ${yaml} Thu Nov 11 10:24:55 EST 2021 Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2021-11-11 10:24:56 EST" Registered S3 methods overwritten by 'ggplot2': method from [.quosures rlang c.quosures rlang print.quosures rlang Error in gzfile(file, "rb") : cannot open the connection Calls: readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/scrnaseq//zUMIs_output/expression/Smartseq3_pilot_miseqrun.dgecounts.rds', probable reason 'No such file or directory' Execution halted `

cziegenhain commented 2 years ago

Hi,

I don't understand what you are describing with regards to your fastq files and the number of lines. Can you elaborate?

The error seems to be a special case when none of the cells have any internal reads assigned to a annotated gene. You can also already see that the gene assignment rates are extremely poor in the log you posted.

For more input from my side, I would need:

You should probably also look into the STAR log file to see how many reads are mapped at all.

brain-discourse commented 2 years ago

Hi,

So performing a normal alignment using STAR after demultiplexing and soft clipping the UMI's gave me average of 80%-10% uniquely mapped reads. However, with zUMIs my alignment stats looked very poor: ` Started job on | Nov 11 13:06:35 Started mapping on | Nov 11 13:31:11 Finished on | Nov 11 13:31:38 Mapping speed, Million of reads per hour | 21.24

                      Number of input reads |   159277
                  Average input read length |   100
                                UNIQUE READS:
               Uniquely mapped reads number |   8705
                    Uniquely mapped reads % |   5.47%
                      Average mapped length |   81.73
                   Number of splices: Total |   287
        Number of splices: Annotated (sjdb) |   272
                   Number of splices: GT/AG |   280
                   Number of splices: GC/AG |   3
                   Number of splices: AT/AC |   0
           Number of splices: Non-canonical |   4
                  Mismatch rate per base, % |   1.04%
                     Deletion rate per base |   0.01%
                    Deletion average length |   1.21
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.15
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   957
         % of reads mapped to multiple loci |   0.60%
    Number of reads mapped to too many loci |   20
         % of reads mapped to too many loci |   0.01%
                              UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 149264 % of reads unmapped: too short | 93.71% Number of reads unmapped: other | 331 % of reads unmapped: other | 0.21% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00% Started job on | Nov 11 13:06:35 Started mapping on | Nov 11 13:31:53 Finished on | Nov 11 13:32:08 Mapping speed, Million of reads per hour | 36.88

                      Number of input reads |   153655
                  Average input read length |   100
                                UNIQUE READS:
               Uniquely mapped reads number |   8946
                    Uniquely mapped reads % |   5.82%
                      Average mapped length |   81.74
                   Number of splices: Total |   285
        Number of splices: Annotated (sjdb) |   269
                   Number of splices: GT/AG |   276
                   Number of splices: GC/AG |   1
                   Number of splices: AT/AC |   0
           Number of splices: Non-canonical |   8
                  Mismatch rate per base, % |   1.03%
                     Deletion rate per base |   0.01%
                    Deletion average length |   1.31
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.11
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   943
         % of reads mapped to multiple loci |   0.61%
    Number of reads mapped to too many loci |   30
         % of reads mapped to too many loci |   0.02%
                              UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 143407 % of reads unmapped: too short | 93.33% Number of reads unmapped: other | 329 % of reads unmapped: other | 0.21% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00% Started job on | Nov 11 13:06:35 Started mapping on | Nov 11 13:30:06 Finished on | Nov 11 13:30:19 Mapping speed, Million of reads per hour | 11.80

                      Number of input reads |   42616
                  Average input read length |   100
                                UNIQUE READS:
               Uniquely mapped reads number |   2435
                    Uniquely mapped reads % |   5.71%
                      Average mapped length |   81.09
                   Number of splices: Total |   80
        Number of splices: Annotated (sjdb) |   73
                   Number of splices: GT/AG |   75
                   Number of splices: GC/AG |   2
                   Number of splices: AT/AC |   0
           Number of splices: Non-canonical |   3
                  Mismatch rate per base, % |   0.97%
                     Deletion rate per base |   0.01%
                    Deletion average length |   1.35
                    Insertion rate per base |   0.01%
                   Insertion average length |   1.08
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |   262
         % of reads mapped to multiple loci |   0.61%
    Number of reads mapped to too many loci |   13
         % of reads mapped to too many loci |   0.03%
                              UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 39805 % of reads unmapped: too short | 93.40% Number of reads unmapped: other | 101 % of reads unmapped: other | 0.24% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%`

Here is my yaml file: `project: Smartseq3_pilot_miseqrun sequence_files: file1: name: /scrnaseq/Undetermined_S0_L001_R1_001.fastq.gz base_definition:

I objective of the experiment to optimize the library prep and assess how deeply we need to sequence. this was a normal miseq run (12M) performed at 50xPE r using dual indexed i5-i7 libraries. I had 96samples so the plate index (i5) was the same and the cDNA was prepped using smartseq3.

My barcode file: CTCTCTATACTCACCG CTCTCTATGGCTCCTA CTCTCTATGTTGACAG CTCTCTATCCATTGCG CTCTCTATTACAGAGT CTCTCTATGTTCGTCT CTCTCTATACGAAGCG CTCTCTATCAGAGTGG CTCTCTATATGGAACA CTCTCTATCATCTTCT CTCTCTATTCCTCAGA CTCTCTATTTCCATTC CTCTCTATCCTTATGT CTCTCTATCAGAAGAA CTCTCTATAATGTGCC CTCTCTATTTCACACT CTCTCTATCTTGTTGG CTCTCTATCCAGGTAA CTCTCTATCTCTCAGG CTCTCTATTTGGCTGC CTCTCTATCTAACAAC CTCTCTATACATCCTT CTCTCTATACGCTGCA CTCTCTATCTAAGGCG CTCTCTATATAGATCC CTCTCTATCAGGAAGG CTCTCTATAAGTACCT CTCTCTATATGGTCCG CTCTCTATTGTAAGAC CTCTCTATCACAGTCT CTCTCTATCACCGCAA CTCTCTATGATGAGAA CTCTCTATCCATACTC CTCTCTATACACAACA CTCTCTATCGATGGCA CTCTCTATGTTATCGA CTCTCTATGGAGCTAT CTCTCTATCGTCTGAA CTCTCTATCGACTAGC CTCTCTATTCCTATCT CTCTCTATCTGGTCGT CTCTCTATTGGTACAG CTCTCTATTGCTCCGT CTCTCTATATGACACC CTCTCTATTCCTTGGC CTCTCTATCAGGCCAT CTCTCTATCAACCGTG CTCTCTATTGGACAAC CTCTCTATTGGTGACT CTCTCTATACTCGAAT CTCTCTATGTTAAGCA CTCTCTATCACATGGT CTCTCTATCTCGTACA CTCTCTATAACGCTTG CTCTCTATCGAGCATT CTCTCTATTGTTGCAC CTCTCTATTCACTCAC CTCTCTATCAACTCCG CTCTCTATTCAACTGA CTCTCTATCTATTCCA CTCTCTATCCGAGTTA CTCTCTATGTACCAGC CTCTCTATAACCAATC CTCTCTATGGTGTGAC CTCTCTATCGTAATTC CTCTCTATATTCCGTA CTCTCTATACCGTTCC CTCTCTATATTCTCCA CTCTCTATCAGGCTTC CTCTCTATACCGACCA CTCTCTATCAAGTAGT CTCTCTATCTGCGAAC CTCTCTATTGGTGGAA CTCTCTATACTTCAAC CTCTCTATTCTATTGG CTCTCTATCCACAATG CTCTCTATATTCGCAG CTCTCTATCGCTCTTG CTCTCTATTCAAGGAT CTCTCTATCGCAACAG CTCTCTATCCTACACA CTCTCTATGTGCGAGT CTCTCTATGTGTCCAT CTCTCTATGCCAGTGT CTCTCTATCTGTACGC CTCTCTATCCTGTTAC CTCTCTATTGAATGTG CTCTCTATTCAGATAC CTCTCTATACCTGAGC CTCTCTATTGAACTCT CTCTCTATCAAGTGAC CTCTCTATCTTCTGGC CTCTCTATCGCGTGAT CTCTCTATATGCCGCT CTCTCTATCTAGCCGA CTCTCTATGTGCGTTC

cziegenhain commented 2 years ago

Hi,

I think the setup in your YAML file for base definitions is wrong, hence the mapping issues. Its not really readable in what you posted above, can you upload the actual file? I'm happy to help but would be great to get things in a proper format.

From your bcl files you should be runing bcl2fastq with the --create-fastq-for-index-reads to not loose the index reads. If i5 is constant it would also be sufficient to just add the i7 index into zUMIs, however you prefer!

Best, Christoph

brain-discourse commented 2 years ago

Hi,

Here is a copy of the yaml file. The non-demultiplexed fastq file along with the index files were provided to me by the sequencing core. I can reach out to them and see if they used the indexreads command. smartseq.txt

cziegenhain commented 2 years ago

Hi,

Yes so the base_definitions can't work like this - I'm surprised you didn't get error messages.

If it is PE + dual index, you would have 4 files (R1, R2, I1, I2). Please check carefully the example for Smart-seq3 on how to feed the files correctly into zUMIs: https://github.com/sdparekh/zUMIs/wiki/Protocol-specific-setup#smart-seq3

Other than that, some other comments on your yaml file: I would just recommend to also set a mem_limit, otherwise it will default to 100GB. Consider setting automatic: no to get statistics on all the expected barcodes. Consider strand: 1 and Ham_Dist: 1 to count more accurately UMI-containing reads

Best, Christoph

brain-discourse commented 2 years ago

Hi,

Thank you so much for pointing this out. I am surprised I missed this. Here is a copy of the fixed file smartseq.txt

I ran the new file and am encountering the same error: `Using miniconda environment for zUMIs! note: internal executables will be used instead of those specified in the YAML file!

You provided these parameters: YAML file: /scrnaseq/smartseq_pilot.yaml zUMIs directory: /scrnaseq/zUMIs STAR executable STAR samtools executable samtools pigz executable pigz Rscript executable Rscript RAM limit: 0 zUMIs version 2.9.7

Mon Nov 15 11:37:27 EST 2021 WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a. Filtering... Mon Nov 15 11:41:09 EST 2021 Registered S3 methods overwritten by 'ggplot2': method from [.quosures rlang c.quosures rlang print.quosures rlang [1] "Warning! None of the annotated barcodes were detected." [1] "Less than 100 barcodes present, will continue with all barcodes..." [1] " reads were assigned to barcodes that do not correspond to intact cells." Error in setnames(x, value) : Passed a vector of type 'logical'. Needs to be type 'character'. Calls: BCbin ... colnames<- -> names<- -> names<-.data.table -> setnames Execution halted Mapping... [1] "2021-11-15 11:41:12 EST" Nov 15 11:41:17 ..... started STAR run Nov 15 11:41:17 ..... loading genome Nov 15 11:41:17 ..... started STAR run Nov 15 11:41:17 ..... loading genome Nov 15 11:41:17 ..... started STAR run Nov 15 11:41:17 ..... loading genome Nov 15 11:42:12 ..... processing annotations GTF Nov 15 11:42:41 ..... inserting junctions into the genome indices Nov 15 11:42:56 ..... processing annotations GTF Nov 15 11:43:22 ..... processing annotations GTF Nov 15 11:43:56 ..... inserting junctions into the genome indices Nov 15 11:44:25 ..... inserting junctions into the genome indices Nov 15 11:59:07 ..... started 1st pass mapping Nov 15 11:59:19 ..... finished 1st pass mapping Nov 15 11:59:21 ..... inserting junctions into the genome indices Nov 15 12:00:12 ..... started 1st pass mapping Nov 15 12:00:50 ..... finished 1st pass mapping Nov 15 12:00:51 ..... inserting junctions into the genome indices Nov 15 12:01:22 ..... started 1st pass mapping Nov 15 12:01:59 ..... finished 1st pass mapping Nov 15 12:02:00 ..... inserting junctions into the genome indices Nov 15 12:03:57 ..... started mapping Nov 15 12:04:11 ..... finished mapping Nov 15 12:04:12 ..... finished successfully Nov 15 12:05:08 ..... started mapping Nov 15 12:05:39 ..... finished mapping Nov 15 12:05:40 ..... finished successfully Nov 15 12:05:51 ..... started mapping Nov 15 12:06:08 ..... finished mapping Nov 15 12:06:09 ..... finished successfully Mon Nov 15 12:06:10 EST 2021 Counting... Registered S3 methods overwritten by 'ggplot2': method from [.quosures rlang c.quosures rlang print.quosures rlang [1] "2021-11-15 12:06:21 EST" Error in fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, "kept_barcodes_binned.txt")) : File '/scrnaseq//zUMIs_output/Smartseq3_pilot_miseqrunkept_barcodes_binned.txt' does not exist or is non-readable. getwd()=='/scrnaseq' Execution halted Mon Nov 15 12:06:21 EST 2021 Loading required package: yaml Loading required package: Matrix [1] "loomR found" Warning! HDF5 library version mismatched error The HDF5 header files used to compile this application do not match the version used by the HDF5 library to which this application is linked. Data corruption or segmentation faults may occur if the application continues. This can happen when an application was compiled by one version of HDF5 but linked with a different version of static or shared HDF5 library. You should recompile the application or check your shared library related settings such as 'LD_LIBRARY_PATH'. You can, at your own risk, disable this warning by setting the environment variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'. Setting it to 2 or higher will suppress the warning messages totally. Headers are 1.10.4, library is 1.10.5 SUMMARY OF THE HDF5 CONFIGURATION

General Information:

               HDF5 Version: 1.10.5
              Configured on: Tue Oct 22 12:02:13 UTC 2019
              Configured by: conda@16247e67ecd5
                Host system: x86_64-conda_cos6-linux-gnu
          Uname information: Linux 16247e67ecd5 4.15.0-1059-azure #64-Ubuntu SMP Fri Sep 13 17:02:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
                   Byte sex: little-endian
         Installation point: /scrnaseq/zUMIs/zUMIs-env

Compiling Options:

                 Build Mode: production
          Debugging Symbols: no
                    Asserts: no
                  Profiling: no
         Optimization Level: high

Linking Options:

                  Libraries: static, shared

Statically Linked Executables: LDFLAGS: -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/scrnaseq/zUMIs/zUMIs-env/lib -Wl,-rpath-link,/scrnaseq/zUMIs/zUMIs-env/lib -L/scrnaseq/zUMIs/zUMIs-env/lib H5_LDFLAGS: AM_LDFLAGS: -L/scrnaseq/zUMIs/zUMIs-env/lib Extra libraries: -lrt -lpthread -lz -ldl -lm Archiver: /home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/_build_env/bin/x86_64-conda_cos6-linux-gnu-ar AR_FLAGS: cr Ranlib: /home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/_build_env/bin/x86_64-conda_cos6-linux-gnu-ranlib

Languages:

                          C: yes
                 C Compiler: /home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc
                   CPPFLAGS: -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -I/scrnaseq/zUMIs/zUMIs-env/include
                H5_CPPFLAGS: -D_GNU_SOURCE -D_POSIX_C_SOURCE=200809L   -DNDEBUG -UH5_DEBUG_API
                AM_CPPFLAGS:  -I/scrnaseq/zUMIs/zUMIs-env/include
                    C Flags: -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -I/scrnaseq/zUMIs/zUMIs-env/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/work=/usr/local/src/conda/hdf5_split-1.10.5 -fdebug-prefix-map=/scrnaseq/zUMIs/zUMIs-env=/usr/local/src/conda-prefix
                 H5 C Flags:  -std=c99  -pedantic -Wall -Wextra -Wbad-function-cast -Wc++-compat -Wcast-align -Wcast-qual -Wconversion -Wdeclaration-after-statement -Wdisabled-optimization -Wfloat-equal -Wformat=2 -Winit-self -Winvalid-pch -Wmissing-declarations -Wmissing-include-dirs -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpacked -Wpointer-arith -Wredundant-decls -Wshadow -Wstrict-prototypes -Wswitch-default -Wswitch-enum -Wundef -Wunused-macros -Wunsafe-loop-optimizations -Wwrite-strings -finline-functions -s -Wno-inline -Wno-aggregate-return -Wno-missing-format-attribute -Wno-missing-noreturn -O
                 AM C Flags: 
           Shared C Library: yes
           Static C Library: yes

                    Fortran: yes
           Fortran Compiler: /home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/_build_env/bin/x86_64-conda_cos6-linux-gnu-gfortran
              Fortran Flags: 
           H5 Fortran Flags:  -pedantic -Wall -Wextra -Wunderflow -Wimplicit-interface -Wsurprising -Wno-c-binding-type  -s -O2
           AM Fortran Flags: 
     Shared Fortran Library: yes
     Static Fortran Library: yes

                        C++: yes
               C++ Compiler: /home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/_build_env/bin/x86_64-conda_cos6-linux-gnu-c++
                  C++ Flags: -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -I/scrnaseq/zUMIs/zUMIs-env/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/hdf5_split_1571745596770/work=/usr/local/src/conda/hdf5_split-1.10.5 -fdebug-prefix-map=/scrnaseq/zUMIs/zUMIs-env=/usr/local/src/conda-prefix
               H5 C++ Flags:   -pedantic -Wall -W -Wundef -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Wredundant-decls -Winline -Wsign-promo -Woverloaded-virtual -Wold-style-cast -Weffc++ -Wreorder -Wnon-virtual-dtor -Wctor-dtor-privacy -Wabi -finline-functions -s -O
               AM C++ Flags: 
         Shared C++ Library: yes
         Static C++ Library: yes

                       Java: no

Features:

               Parallel HDF5: no

Parallel Filtered Dataset Writes: no Large Parallel I/O: no High-level library: yes Threadsafety: yes Default API mapping: v110 With deprecated public symbols: yes I/O filters (external): deflate(zlib) MPE: no Direct VFD: no dmalloc: no Packages w/ extra debug output: none API tracing: no Using memory checker: yes Memory allocation sanity checks: no Function stack tracing: no Strict file format checks: no Optimization instrumentation: no Bye... /scrnaseq/zUMIs/zUMIs.sh: line 307: 248516 Aborted ${Rexc} ${zumisdir}/misc/rds2loom.R ${yaml} Mon Nov 15 12:06:25 EST 2021 Descriptive statistics... [1] "I am loading useful packages for plotting..." [1] "2021-11-15 12:06:25 EST" Registered S3 methods overwritten by 'ggplot2': method from [.quosures rlang c.quosures rlang print.quosures rlang Error in gzfile(file, "rb") : cannot open the connection Calls: readRDS -> gzfile In addition: Warning message: In gzfile(file, "rb") : cannot open compressed file '/scrnaseq//zUMIs_output/expression/Smartseq3_pilot_miseqrun.dgecounts.rds', probable reason 'No such file or directory' Execution halted Mon Nov 15 12:06:33 EST 2021`

this may have something to do with barcode detection since my Smartseq3_pilot_miseqrunkept_barcodes.txt only has 13lines when automatic=yes XC,n,cellindex ATTGCGCAGCATACGA,21438,1 ATTGCGCAGCAGCATA,13441,2 ATTGCGCACGATTTTT,12206,3 ATTGCGCAGATTTTTT,11176,4 GCATACGAAAAAAAAA,8804,5 ATTGCGCAGCATCAGC,4876,6 AGATCGGACTGTCTCT,4834,7 ATTGCGCACATCAGCA,3806,8 CGATTTTTAAAAAAAA,3048,9 AAAAAAAAGCATACGA,2720,10 AAAAAAAAGCAGCATA,1870,11 AAAAAAAAGCATCAGC,1851,12 AAAAAAAACATCAGCA,1705,13

When barcodes automatic =no XC,n,cellindex ,455227,1

cziegenhain commented 2 years ago

Your barcode annotation does not match the barcodes in the data then!

brain-discourse commented 2 years ago

thank you for your prompt answer. I am trying to figure out the format for the barcode file and went with i5-i7 (minus the "-") based on your comment on issue #77 but am having some trouble.

My index file looks like this 1>111333 @M01825:877:000000000-K3KB8:1:2113:16600:29292 1:N:0:CGATTCTT+CTCTCTAT CGATTCTT + 1111>@33 @M01825:877:000000000-K3KB8:1:2113:16517:29292 1:N:0:AAAAAAAA+CTCTCTAT AAAAAAAA + A@A1AD11 @M01825:877:000000000-K3KB8:1:2113:15461:29300 1:N:0:CAATTGTT+TCTTTCCC CAATTGTT +

111133 @M01825:877:000000000-K3KB8:1:2113:15755:29307 1:N:0:CCCCAATC+TCTTTCCC CCCCAATC + 11>A1ACB @M01825:877:000000000-K3KB8:1:2113:15269:29310 1:N:0:ATCTTAAT+CTCTCTAT ATCTTAAT + 11111333 @M01825:877:000000000-K3KB8:1:2113:15631:29320 1:N:0:TTTATTTT+CTCTCTAT TTTATTTT + 1>1113B3 @M01825:877:000000000-K3KB8:1:2113:16006:29323 1:N:0:AAAAATTT+CTCTCTAT AAAAATTT + 11>>>111 @M01825:877:000000000-K3KB8:1:2113:15828:29329 1:N:0:TTCTTTGC+CTCTCTAT TTCTTTGC + 11111331 @M01825:877:000000000-K3KB8:1:2113:15713:29330 1:N:0:TTTATTCC+CTCTCTAT TTTATTCC

and my fastq R1 file follows the format below: AAABBFFFF5FFGGGGGGGGGGGHHHHHHHHHGGHHGHHHHHHHHGHHHH @M01825:877:000000000-K3KB8:1:2113:16600:29292 1:N:0:CGATTCTT+CTCTCTAT ATTGCGCAATGGTGCCACGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAA + AA?A3ADDB2B4B54EAGECEECGGGHGCEG?EGGG/EE@EE?EGCG/BC @M01825:877:000000000-K3KB8:1:2113:16517:29292 1:N:0:AAAAAAAA+CTCTCTAT ATTGCGCAATGCCGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAT + 1>AA11>DD1@1BAA0AEGGGGGGGGGCGGGGGEGGGGGGGGGGGGGG-0 @M01825:877:000000000-K3KB8:1:2113:15461:29300 1:N:0:CAATTGTT+TCTTTCCC CTTTGAGCGTATCGAGGCTCTTAAACCTGCTATTGAGGCTTGTGGCATTT + CCCCBFCFBCCCGGGGGGGGGGHHHHHHHHHHHHHHHHGHHGBGFFHHHH @M01825:877:000000000-K3KB8:1:2113:15755:29307 1:N:0:CCCCAATC+TCTTTCCC GGTTATATTGACCATGCCGCTTTTCTTGGCACGATTAACCCTGATACCAA + AA3>AFFFD@DFFFFEFGCGGGGHGFGHHFHHGFHDDH5GGGFFHHGFFG @M01825:877:000000000-K3KB8:1:2113:15269:29310 1:N:0:ATCTTAAT+CTCTCTAT TGCTCGTATGCTGCTGATGCTCGTATGCTGCTGATGCTCGTATGCTGCTG + ABCBCCDBFFFFGGGGGGGGGGGGGHHHHHHHHGHHHHHGFGHHGHHGHG @M01825:877:000000000-K3KB8:1:2113:15631:29320 1:N:0:TTTATTTT+CTCTCTAT ATTGCGCAATGGTGTGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAA + 1>AAA1ADD?F1BB1AEAEEGG<?BFFF?@@@@?@@@@?@@@@@@@@@?= @M01825:877:000000000-K3KB8:1:2113:16006:29323 1:N:0:AAAAATTT+CTCTCTAT ATTGCGCAATGAGGATCGGGAAAAAAAAAAAAAAAAAAAAAACTCGTATG +

A>A11ADDADFFGGBG0E?EGHHFEGEGGGGGGGEGEGGC//11<0/0F @M01825:877:000000000-K3KB8:1:2113:15828:29329 1:N:0:TTCTTTGC+CTCTCTAT ATTGCGCAATGATGCTCGGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAA +

A>>>DDBAFFFBE5GEEGGGGGHHHGGGGGGGGGGGGGGGGGGGGGGC @M01825:877:000000000-K3KB8:1:2113:15713:29330 1:N:0:TTTATTCC+CTCTCTAT CTCGTATGCTGCTGATGCTCGTATGCTGCTGATGCTCGTATGCTGCTGAT

I continue getting the following error for both i7i5 or i5i7 format of the barcode file: [1] "Warning! None of the annotated barcodes were detected." [1] "Less than 100 barcodes present, will continue with all barcodes..." [1] " reads were assigned to barcodes that do not correspond to intact cells." Error in setnames(x, value) : Can't assign 0 names to a 1 column data.table Calls: BCbin ... colnames<- -> names<- -> names<-.data.table -> setnames Execution halted

and the first few lines of the barcode.txt file looks something like this XC,n,cellindex AAAAAAAACTCTCTAT,85052,1 TCAGTTCGCTCTCTAT,14974,2 TTACTTCGCTCTCTAT,3573,3 TAAAAAAACTCTCTAT,2160,4 CAAAAAAACTCTCTAT,1737,5 ATTCGGCTCTCTCTAT,1685,6 CGTCCATTTCTTTCCC,1279,7 ACTCCTACTCTTTCCC,1152,8 CTTCCTTCTCTTTCCC,1099,9 ACCATCCTTCTTTCCC,942,10 CAAGTGACTCTCTATT,909,11 TCTTCGACTCTTTCCC,881,12 CAAGTGACTCTTTCCC,856,13 CCCTTTCCCTCTCTCT,683,14 AAAAAAACCTCTCTAT,664,15 GCACTACCCTCTCTAT,649,16 CTCCTAGTTCTTTCCC,600,17

I would be very grateful if you could help me troubleshoot this error.

cziegenhain commented 2 years ago

Hi,

Could you please attach the current yaml file and your barcode whitelist?

brain-discourse commented 2 years ago

Sure! barcodes.txt smartseq.txt

cziegenhain commented 2 years ago

In the yaml file, the base definition for the barcode reads should be BC(1-8) but I think zUMIs is robust to your entry too. The format of the barcodes file looks correct.

I am guessing CTCTCTAT is your constant i5 index. Since i5 index is the I2 file, you should either give I2 as the file3 and I2 as file4 in zUMIs, or swap i5-i7 order in the expected barcodes list. If you still have issues, check if you have the correct i7 annotation in terms of the strand that the MiSeq reads. Different Illumina machines (and different chemistries in the machines, eg. NovaSeq v1 vs 1.5) read the indices from different strands. Then you would find your barcode as the reverse complement instead.

Best, Christoph

brain-discourse commented 2 years ago

Yes, i5 index was constant. The format was correct and switching didn't help. However, I realized after perfoming QC that my sample has a lot of adapter read through(attached) which may have caused the barcode detection error. Redid the library prep and just submitted the sample again. Will follow up on it. Thank you! fastqc_run1

cziegenhain commented 2 years ago

I will close this issue then for now.