Closed gesavoigt closed 5 months ago
Hi @gesavoigt , have you already processed the 1st-seq data to extract the barcodes? How does your puck_barcode_file.txt
file look like?
Hi @nukappa, thank you for the suggestion! It was indeed an issue with the puck_barcode_file.txt
. Its head looks like this now and doesn't produce the error anymore:
NAGACGACTCTCCCCGCTATAGATN,11019698,11011015
NTCAGCAAGAAGCCCCATCGAGATN,11018957,11011016
NTAATCAATACGCCGCGGTTAGATN,110112031,11011016
NACTCCCTCCACTCTACTCCAGATN,11019724,11011016
Unfortunately, I now get stuck later on. It seems as though the DGE is empty. The error message I got is this:
[Fri May 24 13:28:32 2024]
rule create_h5ad_dge:
input: projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/dge/dge.exon.polyA_adapter_trimmed.1000_beads_no_spatial_data.txt.gz, projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/dge/dge.exon.polyA_adapter_trimmed.1000_beads_no_spatial_data.summary.txt, projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/puck_barcode_files_summary.csv
output: projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/dge/dge.exon.polyA_adapter_trimmed.1000_beads_no_spatial_data.h5ad, projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/dge/dge.exon.polyA_adapter_trimmed.1000_beads_no_spatial_data.obs.csv
jobid: 18
wildcards: project_id=cho2021_liver, sample_id=SRR14082756, data_root_type=complete_data, downsampling_percentage=, dge_type=.exon, dge_cleaned=, polyA_adapter_trimmed=.polyA_adapter_trimmed, mm_included=, n_beads=1000, puck_barcode_file_id=no_spatial_data, is_external=
^[[33mJob counts:
count jobs
1 create_h5ad_dge
1^[[0m
^[[32m[Fri May 24 13:28:34 2024]^[[0m
^[[31mError in rule create_h5ad_dge:^[[0m
^[[31m jobid: 0^[[0m
^[[31m output: projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/dge/dge.exon.polyA_adapter_trimmed.1000_beads_puck_barcode_file.h5ad, projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/dge/dge.exon.polyA_adapter_trimmed.1000_beads_puck_barcode_file.obs.csv^[[0m
^[[31m^[[0m
^[[31mRuleException:
AttributeError in line 491 of /home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/spacemake/snakemake/main.smk:
'NoneType' object has no attribute 'shape'
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 2330, in run_wrapper
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/spacemake/snakemake/main.smk", line 491, in __rule_create_h5ad_dge
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/spacemake/preprocess/dge.py", line 139, in dge_to_sparse_adata
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 569, in _callback
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/concurrent/futures/thread.py", line 58, in run
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 555, in cached_or_run
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 2362, in run_wrapper^[[0m
^[[31mExiting because a job execution failed. Look above for error message^[[0m
output provided by 'mapping.smk' module (via 'get_mapped_BAM_output'): 'projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/final.polyA_adapter_trimmed.bam'
output provided by 'mapping.smk' module (via 'get_star_unloaded_flag'): 'species_data/mm10/genome/star_index/genomeUnload.done'
need to add mt-missing because no mitochondrial stuff was among the genes for annotation
Job failed, going on with independent jobs.
^[[33mJob counts:
count jobs
1 create_h5ad_dge
1^[[0m
^[[32m[Fri May 24 13:28:35 2024]^[[0m
^[[31mError in rule create_h5ad_dge:^[[0m
^[[31m jobid: 0^[[0m
^[[31m output: projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/dge/dge.exon.polyA_adapter_trimmed.1000_beads_no_spatial_data.h5ad, projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/dge/dge.exon.polyA_adapter_trimmed.1000_beads_no_spatial_data.obs.csv^[[0m
^[[31m^[[0m
^[[31mRuleException:
AttributeError in line 491 of /home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/spacemake/snakemake/main.smk:
'NoneType' object has no attribute 'shape'
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 2330, in run_wrapper
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/spacemake/snakemake/main.smk", line 491, in __rule_create_h5ad_dge
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/spacemake/preprocess/dge.py", line 139, in dge_to_sparse_adata
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 569, in _callback
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/concurrent/futures/thread.py", line 58, in run
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 555, in cached_or_run
File "/home/hd/hd_hd/hd_fz305/miniconda3/envs/spacemake/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 2362, in run_wrapper^[[0m
^[[31mExiting because a job execution failed. Look above for error message^[[0m
output provided by 'mapping.smk' module (via 'get_mapped_BAM_output'): 'projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/final.polyA_adapter_trimmed.bam'
output provided by 'mapping.smk' module (via 'get_star_unloaded_flag'): 'species_data/mm10/genome/star_index/genomeUnload.done'
need to add mt-missing because no mitochondrial stuff was among the genes for annotation
Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/bwfor/work/ws/hd_fz305-seqscope/data/cho2021/results_spacemake/.snakemake/log/2024-05-24T121941.357375.snakemake.log
output provided by 'mapping.smk' module (via 'get_mapped_BAM_output'): 'projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/final.polyA_adapter_trimmed.bam'
output provided by 'mapping.smk' module (via 'get_star_unloaded_flag'): 'species_data/mm10/genome/star_index/genomeUnload.done'
output provided by 'mapping.smk' module (via 'get_mapped_BAM_output'): 'projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/final.polyA_adapter_trimmed.bam'
output provided by 'mapping.smk' module (via 'get_star_unloaded_flag'): 'species_data/mm10/genome/star_index/genomeUnload.done'
output provided by 'mapping.smk' module (via 'get_mapped_BAM_output'): 'projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/final.polyA_adapter_trimmed.bam'
output provided by 'mapping.smk' module (via 'get_star_unloaded_flag'): 'species_data/mm10/genome/star_index/genomeUnload.done'
ERROR: SpacemakeError
If I understand correctly, one of the inputs is dge.exon.polyA_adapter_trimmed.1000_beads_no_spatial_data.summary.txt
, which contains only zeros:
## htsjdk.samtools.metrics.StringHeader
# DigitalExpression INPUT=projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/final.polyA_adapter_trimmed.bam SUMMARY=projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/dge/dge.exon.polyA_adapter_trimmed.1000_beads_no_spatial_data.summary.txt OUTPUT=projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/dge/dge.exon.polyA_adapter_trimmed.1000_beads_no_spatial_data.txt.gz CELL_BARCODE_TAG=CB MOLECULAR_BARCODE_TAG=MI CELL_BC_FILE=projects/cho2021_liver/processed_data/SRR14082756/illumina/complete_data/topBarcodes.polyA_adapter_trimmed.1000_beads.txt TMP_DIR=[/tmp] OUTPUT_READS_INSTEAD=false OMIT_MISSING_CELLS=false EDIT_DISTANCE=1 READ_MQ=10 MIN_BC_READ_THRESHOLD=0 USE_STRAND_INFO=true RARE_UMI_FILTER_THRESHOLD=0.0 GENE_NAME_TAG=gn GENE_STRAND_TAG=gs GENE_FUNCTION_TAG=gf STRAND_STRATEGY=SENSE LOCUS_FUNCTION_LIST=[CODING, UTR] VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
## htsjdk.samtools.metrics.StringHeader
# Started on: Fri May 24 13:24:15 CEST 2024
## METRICS CLASS org.broadinstitute.dropseqrna.barnyard.DigitalExpression$DESummary
CELL_BARCODE NUM_GENIC_READS NUM_TRANSCRIPTS NUM_GENES
AGGGTAGAAAGGGAGATAAG 0 0 0
CTCTCTCTCTCTCTCTCTCT 0 0 0
GGCTTAGTCTTCCGGCTGTG 0 0 0
Other related files, such as final.polyA_adapter_trimmed.bam -> genome.STAR.bam
, topBarcodes.polyA_adapter_trimmed.1000_beads.txt
, dge.exon.polyA_adapter_trimmed.1000_beads_no_spatial_data.txt.gz
& dge.exon.polyA_adapter_trimmed.1000_beads_puck_barcode_file.txt.gz
have data (let me know if you need their heads, as this is already a very long post). On a similar note, the spatial_barcodes_puck_barcode_file.csv
file only contains a header, too:
cell_bc,x_pos,y_pos
Do you have any idea what might be causing this issue? I would appreciate any help in debugging.
hi @gesavoigt , does your puck_barcode_file
also contain the header: cell_bc,x_pos,y_pos
? Since your genome.star.bam
is populated with mapped reads, and the quantification for the top1000 barcodes worked, it seems there's something wrong with your puck barcode file and data doesn't match to it.
Hi @nukappa, I forgot to include the header but it is barcode,xcoord,ycoord
. From what I found in the documentation, this should work as well but please correct me if I am wrong.
I was wondering if it was an issue with the read trimming (I previously included a constant region in the barcode, in the above post AGATN
), which I fixed but continue to get the same error. Does any other processing, such as trimming, need to be done to the 2nd-seq file? As an example, the first 4 lines in read 1 & 2 look like this (resp.):
NGAAGACACATGGCCTAATTTCTTGTGACTACAGCACCCTCGACTCTCGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
+SRR14082756.1 1 length=151
#AAAF7JJFJFJJFFJ<JFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
NTTGTTGCCATATATTATAATAAATGCTGCACAGAAAATGTAAATAAACACTTAGTTAAAAATCCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+SRR14082756.1 1 length=151
#AAAFFFJJJ7JAJ<AJ<JJAJJJ<A7-<7F-F<FJJJ---JJJFFJJFJF7-JF--<AJFJ--777FJJJJJJJJJJJJJJJJJJJJJFFJJJFJJJJJJJJFAJF<A<F<FFFJJFJF<-777AAFJFJJJF--7AFFA--7A<F<F<-
Also, is the fact that there seem to be no mitochondrial genes of relevance (now)?
Many thanks in advance for your attention and suggestions!
Hi @gesavoigt , did you solve this?
If not, could you share here the star.Log.final
to see if reads actually map to the genome? Could it be the read files R1 and R2 are swapped?
Hi @nukappa, I just found the issue, it was actually with the reference genome. Thanks for helping out even though it turned out not to be about the spacemake pipeline, I just could not trace it back through the error message that was given. Maybe at least it will help somebody else out.
I am trying to process seq-scope data and am getting an error message in create_spatial_barcode_file:
My initial idea was that this was an issue with the pandas version, but downgrading didn't solve it: As specified in the spacemake environment.yaml, pandas was 1.5.1. I downgraded to pandas=1.4.0, but resuming the pipeline threw the same error. Pandas 1.3.0 cannot be installed due to dependencies issues.
Any ideas on how to fix/circumvent this issue?
I am using spacemake v0.7.8. The data is from Cho et al., 2021, specifically using NCBI SRR14082756 as 2nd-seq data and DraI-100pM-mbcore-RD2.fastq.gz as 1st-seq data. Let me know in case you need any other information.