Closed mdhfz89 closed 3 years ago
Hi,
I'm not really familiar with microSPLIT and lack the time to dive into your description of the cutadapt / dephasing steps so will not comment on that now.
I know we have had issues with fastq input that is not gzipped. That would be my first recommendation to change, please gzip the input fastq files.
Some other comments on your yaml file (not related to this particular error but might lead to issues later):
Best, Christoph
Hi Christoph,
I did what you recommended and gzipped my files and I think I made the BC cutoff more lenient. Also converted the gff3 to a gtf format using agat.
agat_convert_sp_gff2gtf.pl --gff bsub.gff3 -o bsub.gtf
However I'm still facing some kind of error here.
bash /home/hafiz/tools/zUMIs/zUMIs.sh -d /home/hafiz/tools/zUMIs -y microsplitTest_ubuntu.yaml
You provided these parameters:
YAML file: microsplitTest_ubuntu.yaml
zUMIs directory: /home/hafiz/tools/zUMIs
STAR executable STAR
samtools executable samtools
pigz executable pigz
Rscript executable Rscript
RAM limit: 24
zUMIs version 2.9.7
Fri Aug 20 10:39:18 +08 2021
WARNING: The STAR version used for mapping is 2.7.9a and the STAR index was created using the version 2.7.4a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.9a.
Filtering...
Fri Aug 20 10:39:22 +08 2021
Error in uik(bccount$cellindex, bccount$cs/1000) :
Method is not applicable for such a small vector. Please give at least a 5 numbers vector
Calls: cellBC -> .cellBarcode_unknown -> .FindBCcut -> uik
Execution halted
Mapping...
[1] "2021-08-20 10:39:23 +08"
Warning message:
NAs introduced by coercion
STAR --readFilesCommand samtools view -@ 2 --outSAMmultNmax 1 --outFilterMultimapNmax 50 --outSAMunmapped Within --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --genomeDir /home/hafiz/Documents/Hafiz/microSPLiT/reads/00_pipeline_testing/03_bsubgenome --sjdbGTFfile /home/hafiz/Documents/Hafiz/microSPLiT/reads/00_pipeline_testing/bsub.gtf --runThreadN 8 --sjdbOverhang 73 --readFilesType SAM SE --alignIntronMax 1 --genomeSAindexNbases 10 --twopassMode Basic --readFilesIn /home/hafiz/Documents/Hafiz/microSPLiT/reads/00_pipeline_testing/03_zUMItest/zUMIs_output/.tmpMerge//microsplitTest_ubuntu.microsplitTest_ubuntuaa.filtered.tagged.bam --outFileNamePrefix /home/hafiz/Documents/Hafiz/microSPLiT/reads/00_pipeline_testing/03_zUMItest/microsplitTest_ubuntu.filtered.tagged.
STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Aug 20 10:39:24 ..... started STAR run
Aug 20 10:39:24 ..... loading genome
Aug 20 10:39:24 ..... processing annotations GTF
Aug 20 10:39:24 ..... inserting junctions into the genome indices
Aug 20 10:39:24 ..... started 1st pass mapping
Aug 20 10:39:27 ..... finished 1st pass mapping
Aug 20 10:39:27 ..... inserting junctions into the genome indices
Aug 20 10:39:27 ..... started mapping
Aug 20 10:39:30 ..... finished mapping
Aug 20 10:39:30 ..... finished successfully
Fri Aug 20 10:39:30 +08 2021
Counting...
[1] "2021-08-20 10:39:38 +08"
Error in fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, "kept_barcodes_binned.txt")) :
File '/home/hafiz/Documents/Hafiz/microSPLiT/reads/00_pipeline_testing/03_zUMItest/zUMIs_output/microsplitTest_ubuntukept_barcodes_binned.txt' does not exist or is non-readable. getwd()=='/home/hafiz/Documents/Hafiz/microSPLiT/reads/00_pipeline_testing/03_zUMItest'
Execution halted
Fri Aug 20 10:39:38 +08 2021
Loading required package: yaml
Loading required package: Matrix
[1] "loomR found"
Error in gzfile(file, "rb") : cannot open the connection
Calls: rds_to_loom -> readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
cannot open compressed file '/home/hafiz/Documents/Hafiz/microSPLiT/reads/00_pipeline_testing/03_zUMItest/zUMIs_output/expression/microsplitTest_ubuntu.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Fri Aug 20 10:39:40 +08 2021
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2021-08-20 10:39:40 +08"
Error in data.table::fread(paste0(opt$out_dir, "/zUMIs_output/", opt$project, :
File '/home/hafiz/Documents/Hafiz/microSPLiT/reads/00_pipeline_testing/03_zUMItest/zUMIs_output/microsplitTest_ubuntukept_barcodes.txt' does not exist or is non-readable. getwd()=='/home/hafiz/Documents/Hafiz/microSPLiT/reads/00_pipeline_testing/03_zUMItest'
Execution halted
Fri Aug 20 10:39:44 +08 2021
Below is the new yaml file
project: microsplitTest_ubuntu
sequence_files:
file1:
name: sub_1_filtered.fastq.gz
base_definition:
- cDNA(1-76)
file2:
name: sub_2_filtered.fastq.gz
base_definition:
- BC(11-18,49-56,79-86)
- UMI(1-10)
reference:
STAR_index: /home/hafiz/Documents/Hafiz/microSPLiT/reads/00_pipeline_testing/03_bsubgenome
GTF_file: /home/hafiz/Documents/Hafiz/microSPLiT/reads/00_pipeline_testing/bsub.gtf
additional_STAR_params: --alignIntronMax 1 --genomeSAindexNbases 10
additional_files: ~
out_dir: /home/hafiz/Documents/Hafiz/microSPLiT/reads/00_pipeline_testing/03_zUMItest
num_threads: 10
mem_limit: 24
filter_cutoffs:
BC_filter:
num_bases: 2
phred: 20
UMI_filter:
num_bases: 2
phred: 10
barcodes:
barcode_num: null
barcode_file: null
barcode_sharing: null
automatic: yes
BarcodeBinning: 1
nReadsperCell: 100
demultiplex: yes
counting_opts:
introns: no
downsampling: 0
strand: 0
Ham_Dist: 0
write_ham: no
velocyto: no
primaryHit: yes
twoPass: yes
make_stats: yes
which_Stage: Filtering
Rscript_exec: Rscript
STAR_exec: STAR
pigz_exec: pigz
samtools_exec: samtools
Thank you so much for your suggestions previously.
Hi,
It's the automatic cell barcode detection that fails. This happens when there is basically no reads left after filtering or not at least a few distinct barcode sequences.
Maybe you went a bit small with your test data set?
Hi Christoph,
You're right. I retried it with a bigger subset and it works. Thank you so much for your help.
Cheers
Hi,
I'm facing some problems with zUMI when trying to process microSPLIT reads downloaded from SRA. I'm hoping that I can get some clues on how to resolve the issue I'm facing here. microSPLIT (doi: 10.1126/science.aba5257) is the bacterial version of SPLITseq that uses combinatorial indexing. In the paper, they worked on Bacillus subtilis so the STAR commands were set up with that in mind. The fasta and gff file used are attached here.
bsub.zip
To start with, I downloaded the data using and then subsampled the reads to work with a smaller test dataset
I then dephased the reads by removing any pairs that the right 8 bases of read 2 that do not match the expected barcode sequence (the expected index is also attached). Since cutadapt outputs each match into an individual file based on the name of the expected sequence in "BC1_dephasing.fasta", I then concatenated the matches. I used a wrapper script to achieve this.
BC1_dephasing.zip
yes Y | . dephasingWithCutadapt_ubuntu_nostops.sh sub_2.fastq sub_1.fastq
The problem comes when I'm trying to run zUMI with these files and I think one error sends the whole process into being unable to complete.
The stdout that I'm seeing is:
bash /home/hafiz/tools/zUMIs/zUMIs.sh -d /home/hafiz/tools/zUMIs -y microsplitTest_ubuntu.yaml
So I think that filtering step failed due to some syntax error but I tried to figure out what was going on and I can't seem to figure it out. I thought that the version of zUMI I downloaded had a problem so I re-downloaded it but am still facing the same issue.
The yaml file I used is
Looking forward to any suggestions you might have. If you need any files, I'll try to provide them here.