vanheeringen-lab / seq2science

Automated and customizable preprocessing of Next-Generation Sequencing data, including full (sc)ATAC-seq, ChIP-seq, and (sc)RNA-seq workflows. Works equally easy with public as local data.
https://vanheeringen-lab.github.io/seq2science
MIT License
156 stars 27 forks source link

BUG: [scATAC-seq successful run but bam file and snap object are missing cell barcodes] #976

Closed inesmarais closed 1 year ago

inesmarais commented 1 year ago

Describe the bug Hi, I have used the scATAC-seq pipeline with sci-ATAC-seq input data. Although the run was indicated to be successful, the final bam file is missing cell barcodes and we think that makes the snap object contain 0 barcodes as well.

Expected behavior We expected the final bam file to have cell barcodes and the snap object to at least contain cell barcodes. The pipeline should indicate that something goes wrong if this is the case.

Screenshots Bam file compared to a bam file generated by Cellranger from 10X. The top bamfile is the bamfile where no cell barcodes can be observed. no_cell_barcode

Snapatac QC file

Total number of unique barcodes:             0
TN - Total number of fragments:              0
UM - Total number of uniquely mapped:        0
SE - Total number of single ends:            0
SA - Total number of secondary alignments:   0
PE - Total number of paired ends:            0
PP - Total number of proper paired:          0
PL - Total number of proper frag len:        0
US - Total number of usable fragments:       0
UQ - Total number of unique fragments:       0
CM - Total number of chrM fragments:         0

Seq2science log file

             ____  ____   __              
            / ___)(  __) /  \             
            \___ \ ) _) (  O )            
            (____/(____) \__\)            
                   ____                   
                  (___ \                  
                   / __/                  
                  (____)                  
   ____   ___  __  ____  __ _   ___  ____ 
  / ___) / __)(  )(  __)(  ( \ / __)(  __)
  \___ \( (__  )(  ) _) /    /( (__  ) _) 
  (____/ \___)(__)(____)\_)__) \___)(____)

workflow: scatac-seq
version:  0.9.7
docs:     https://vanheeringen-lab.github.io/seq2science

Checking if seq2science was run already, if something in the configuration was changed, and if so, if seq2science needs to re-run any jobs.
Checking if samples are available online... This can take some time.
Done!

| config variable              | value                                                                                                   |
|:-----------------------------|:--------------------------------------------------------------------------------------------------------|
| samples                      | /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/rna_seq_dev_cornea/seq2science_config/samples_atac.tsv |
| fastq_dir                    | /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/fastq                                 |
| final_bam_dir                | /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam                 |
| genome_dir                   | /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/genomes                               |
| log_dir                      | /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log                       |
| qc_dir                       | /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc                        |
| aligner                      | bwa-mem2                                                                                                |
| bam_sort_order               | coordinate                                                                                              |
| bam_sorter                   | samtools                                                                                                |
| bin_opt                      | --bin-size-list 5000 --verbose=True                                                                     |
| cores                        | 8                                                                                                       |
| create_qc_report             | True                                                                                                    |
| deeptools_computematrix_gene | --beforeRegionStartLength 3000 --regionBodyLength 5000 --afterRegionStartLength 3000                    |
| deeptools_multibamsummary    | --distanceBetweenBins 9000 --binSize 1000                                                               |
| deeptools_plotcorrelation    | --colorMap RdYlBu_r --plotNumbers                                                                       |
| deeptools_qc                 | True                                                                                                    |
| fqext                        | ['R1', 'R2']                                                                                            |
| fqsuffix                     | fastq                                                                                                   |
| markduplicates               | -Xms4G -Xmx6G MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=999                                                    |
| snaptools_opt                | --min-flen=0 --max-flen=1000 --keep-single=FALSE --keep-secondary=FALSE --overwrite=True --min-cov=100  |
|                              | --verbose=True                                                                                          |
| trimmer                      | fastp                                                                                                   |
| layout                       | sample       layout                                                                                     |
|                              | -----------  --------                                                                                   |
|                              | SRR11692131  PAIRED                                                                                     |

Building DAG of jobs...
Done. Now starting the real run.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Provided resources: parallel_downloads=3, genomepy_downloads=1, deeptools_limit=16, R_scripts=1, mem_gb=1007
Singularity containers: ignored
Job stats:
job                        count    min threads    max threads
-----------------------  -------  -------------  -------------
bwa_mem2                       1              7              7
create_SNAP_object             1              4              4
create_bins_SNAP_object        1              1              1
insert_size_metrics            1              1              1
mark_duplicates                1              1              1
mt_nuc_ratio_calculator        1              1              1
multiqc                        1              1              1
multiqc_explain                1              1              1
plotFingerprint                1              8              8
sambamba_sort                  1              2              2
samtools_index                 1              1              1
samtools_presort               1              1              1
samtools_stats                 1              1              1
seq2science                    1              1              1
total                         14              1              8

Select jobs to execute...
[Mon Apr  3 11:23:19 2023]

group job c306b865-e71f-4a86-b583-48eddc8b7f8e (jobs in lexicogr. order):

    [Mon Apr  3 11:23:19 2023]
    rule bwa_mem2:
        input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/fastq_trimmed/SRR11692131_R1_trimmed.fastq.gz, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/fastq_trimmed/SRR11692131_R2_trimmed.fastq.gz, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/genomes/hg38/index/bwa-mem2
        output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/bwa-mem2/hg38-SRR11692131.samtools-coordinate.pipe (pipe)
        log: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log/bwa-mem2_align/hg38-SRR11692131.log
        jobid: 7
        benchmark: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/benchmark/bwa-mem2_align/hg38-SRR11692131.benchmark.txt
        wildcards: assembly=hg38, sample=SRR11692131
        threads: 7
        resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp, mem_gb=40

    [Mon Apr  3 11:23:19 2023]
    rule samtools_presort:
        input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/bwa-mem2/hg38-SRR11692131.samtools-coordinate.pipe
        output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/bwa-mem2/hg38-SRR11692131.samtools-coordinate-unsieved.bam
        log: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log/samtools_presort/hg38-SRR11692131.log
        jobid: 6
        benchmark: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/benchmark/samtools_presort/hg38-SRR11692131.benchmark.txt
        wildcards: assembly=hg38, sample=SRR11692131
        resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp, mem_gb=2

Activating conda environment: ../../../../../../../../vol/mbconda/imarais/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/dc86c926309679daf0585ef4a495f01b_
Activating conda environment: ../../../../../../../../vol/mbconda/imarais/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/c3593d0fd24408183d1c9249cd3edfee_
[Mon Apr  3 19:07:56 2023]
Finished job 7.
[Mon Apr  3 19:07:57 2023]
Finished job 6.
2 of 14 steps (14%) done
Removing temporary output /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/fastq_trimmed/SRR11692131_R1_trimmed.fastq.gz.
Removing temporary output /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/fastq_trimmed/SRR11692131_R2_trimmed.fastq.gz.
Select jobs to execute...

[Mon Apr  3 19:07:59 2023]
rule multiqc_explain:
    output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log/workflow_explanation_mqc.html
    jobid: 16
    resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp

[Mon Apr  3 19:07:59 2023]
rule mt_nuc_ratio_calculator:
    input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/bwa-mem2/hg38-SRR11692131.samtools-coordinate-unsieved.bam, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/genomes/hg38/hg38.fa.sizes
    output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/bwa-mem2/hg38-SRR11692131.samtools-coordinate-unsieved.bam.mtnucratiomtnuc.json
    jobid: 17
    wildcards: assembly=hg38, sample=SRR11692131
    resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp, time=0-06:00:00

Activating conda environment: ../../../../../../../../vol/mbconda/imarais/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/374337caf0c2b050871552b86176b450_

[Mon Apr  3 19:07:59 2023]
rule mark_duplicates:
    input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/bwa-mem2/hg38-SRR11692131.samtools-coordinate-unsieved.bam
    output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.samtools-coordinate.bam, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/markdup/hg38-SRR11692131.samtools-coordinate.metrics.txt
    log: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log/mark_duplicates/hg38-SRR11692131.log
    jobid: 5
    benchmark: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/benchmark/mark_duplicates/hg38-SRR11692131.benchmark.txt
    wildcards: assembly=hg38, sample=SRR11692131
    resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp, mem_gb=8

Activating conda environment: ../../../../../../../../vol/mbconda/imarais/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/c0a9d5bbf8360710f9ca026e5856f5a1_
[Mon Apr  3 19:08:14 2023]
Finished job 16.
3 of 14 steps (21%) done
[Mon Apr  3 19:25:19 2023]
Finished job 17.
4 of 14 steps (29%) done
[Mon Apr  3 21:36:33 2023]
Finished job 5.
5 of 14 steps (36%) done
Removing temporary output /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/bwa-mem2/hg38-SRR11692131.samtools-coordinate-unsieved.bam.
Select jobs to execute...

[Mon Apr  3 21:36:35 2023]
rule insert_size_metrics:
    input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.samtools-coordinate.bam
    output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/InsertSizeMetrics/hg38-SRR11692131.tsv, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/InsertSizeMetrics/hg38-SRR11692131.pdf
    log: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log/InsertSizeMetrics/hg38-SRR11692131.log
    jobid: 23
    wildcards: assembly=hg38, sample=SRR11692131
    resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp, time=0-06:00:00

[Mon Apr  3 21:36:35 2023]
rule samtools_index:
    input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.samtools-coordinate.bam
    output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.samtools-coordinate.bam.bai
    jobid: 22
    wildcards: filepath=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.samtools-coordinate, b=b
    resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp
Activating conda environment: ../../../../../../../../vol/mbconda/imarais/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/c0a9d5bbf8360710f9ca026e5856f5a1_

Activating conda environment: ../../../../../../../../vol/mbconda/imarais/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/c3593d0fd24408183d1c9249cd3edfee_

[Mon Apr  3 21:36:35 2023]
rule sambamba_sort:
    input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.samtools-coordinate.bam
    output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.sambamba-queryname.bam
    log: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log/sambamba_sort/hg38-SRR11692131-sambamba_queryname.log
    jobid: 27
    benchmark: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/benchmark/sambamba_sort/hg38-SRR11692131-queryname.benchmark.txt
    wildcards: assembly=hg38, sample=SRR11692131, sorting=queryname
    threads: 2
    resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp, mem_gb=2

Activating conda environment: ../../../../../../../../vol/mbconda/imarais/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/46d6a73b6b45bee69b97c3332ae61586_

[Mon Apr  3 21:36:35 2023]
rule samtools_stats:
    input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.samtools-coordinate.bam
    output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/samtools_stats/bwa-mem2/hg38-SRR11692131.samtools-coordinate.samtools_stats.txt
    log: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log/samtools_stats/bwa-mem2/hg38-SRR11692131-samtools-coordinate.log
    jobid: 15
    wildcards: directory=bwa-mem2, assembly=hg38, sample=SRR11692131, sorter=samtools, sorting=coordinate
    resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp, time=0-06:00:00

Activating conda environment: ../../../../../../../../vol/mbconda/imarais/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/c3593d0fd24408183d1c9249cd3edfee_
[Mon Apr  3 21:40:48 2023]
Finished job 22.
6 of 14 steps (43%) done
Select jobs to execute...
[Mon Apr  3 21:53:13 2023]
Finished job 15.
7 of 14 steps (50%) done
[Mon Apr  3 21:55:28 2023]
Finished job 23.
8 of 14 steps (57%) done
[Mon Apr  3 23:24:51 2023]
Finished job 27.
9 of 14 steps (64%) done

[Mon Apr  3 23:24:51 2023]
rule create_SNAP_object:
    input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.sambamba-queryname.bam, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/genomes/hg38/hg38.fa.sizes
    output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/snap/hg38-SRR11692131.snap
    log: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log/create_SNAP_object/hg38-SRR11692131.log
    jobid: 26
    benchmark: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/benchmark/create_SNAP_object/hg38-SRR11692131.benchmark.txt
    wildcards: assembly=hg38, sample=SRR11692131
    threads: 4
    resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp

Activating conda environment: ../../../../../../../../vol/mbconda/imarais/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ec22a2dac18fa0c37eacd4cb5690aa3e_
[Tue Apr  4 01:46:08 2023]
Finished job 26.
10 of 14 steps (71%) done
Removing temporary output /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.sambamba-queryname.bam.
Select jobs to execute...

[Tue Apr  4 01:46:08 2023]
rule plotFingerprint:
    input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.samtools-coordinate.bam, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.samtools-coordinate.bam.bai
    output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/plotFingerprint/hg38.tsv
    log: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log/plotFingerprint/hg38.log
    jobid: 21
    benchmark: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/benchmark/plotFingerprint/hg38.benchmark.txt
    wildcards: assembly=hg38
    threads: 8
    resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp, mem_gb=5

Activating conda environment: ../../../../../../../../vol/mbconda/imarais/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/7fe602773e842c979f2b6238a3f2e7b2_
[Tue Apr  4 01:49:23 2023]
Finished job 21.
11 of 14 steps (79%) done
Removing temporary output /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/final_bam/hg38-SRR11692131.samtools-coordinate.bam.bai.
Select jobs to execute...

[Tue Apr  4 01:49:23 2023]
rule create_bins_SNAP_object:
    input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/snap/hg38-SRR11692131.snap
    output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/snap/hg38-SRR11692131.binned.snap
    log: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log/create_bins_SNAP_object/hg38-SRR11692131.log
    jobid: 25
    benchmark: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/benchmark/create_SNAP_object/hg38-SRR11692131.benchmark.txt
    wildcards: assembly=hg38, sample=SRR11692131
    resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp

Activating conda environment: ../../../../../../../../vol/mbconda/imarais/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/ec22a2dac18fa0c37eacd4cb5690aa3e_

[Tue Apr  4 01:49:23 2023]
rule multiqc:
    input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/header_info.yaml, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/sample_names_hg38.tsv, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/schema.yaml, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/markdup/hg38-SRR11692131.samtools-coordinate.metrics.txt, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/samtools_stats/bwa-mem2/hg38-SRR11692131.samtools-coordinate.samtools_stats.txt, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log/workflow_explanation_mqc.html, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/bwa-mem2/hg38-SRR11692131.samtools-coordinate-unsieved.bam.mtnucratiomtnuc.json, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/assembly_hg38_stats_mqc.html, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/trimming/SRR11692131.fastp.json, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/plotFingerprint/hg38.tsv, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/InsertSizeMetrics/hg38-SRR11692131.tsv, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/samplesconfig_mqc.html
    output: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/multiqc_hg38.html, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/multiqc_hg38_data
    log: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/log/multiqc_hg38.log
    jobid: 1
    wildcards: assembly=hg38
    resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp

Activating conda environment: ../../../../../../../../vol/mbconda/imarais/miniconda3/envs/seq2science/lib/python3.8/site-packages/seq2science/.snakemake/bba3f90abab18df10069e3707ec475ee_
[Tue Apr  4 01:49:31 2023]
Finished job 25.
12 of 14 steps (86%) done
[Tue Apr  4 01:49:38 2023]
Finished job 1.
13 of 14 steps (93%) done
Removing temporary output /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/header_info.yaml.
Removing temporary output /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/sample_names_hg38.tsv.
Removing temporary output /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/schema.yaml.
Removing temporary output /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/samplesconfig_mqc.html.
Select jobs to execute...

[Tue Apr  4 01:49:38 2023]
localrule seq2science:
    input: /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/multiqc_hg38.html, /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/snap/hg38-SRR11692131.binned.snap
    jobid: 0
    resources: tmpdir=/ceph/rimlsfnwi/data/moldevbio/zhou/imarais/tmp

[Tue Apr  4 01:49:38 2023]
Finished job 0.
14 of 14 steps (100%) done
Complete log: seq2science.2023-04-03T112313.216455.log

      ⊂_ヽ
        \\
         \( ͡° ͜ʖ ͡°)  --  Nice, a succesful run! Check out the docs for help with the results:
           > ⌒ヽ       https://vanheeringen-lab.github.io/seq2science/content/workflows/scatac_seq.html.
          /   へ\      Make sure to check out the QC report, it can be found at
          /  / \\    /ceph/rimlsfnwi/data/moldevbio/zhou/imarais/data/atac_seq_dev_eye/SRR11692131/qc/multiqc_hg38.html.
          レ ノ   ヽ_つ  
         / /            
         / /|            
        ( (ヽ            
        | |、\
        | 丿 \ ⌒)
        | |  ) /
        ノ )  Lノ
       (_/
JGASmits commented 1 year ago

The current seq2science pipeline requires single cell fastq files (which we use for plate based scATAC), it does not support 10x files where the files of multiple cells are concatinated in a single bam file.

Correct me if im mistaken and that is not what you are trying. So i dont think its a bug, its a missing feature.

Gr Jos

Maarten-vd-Sande commented 1 year ago

I'm not so sure, can't look into it with too much detail right now. Might be related to the fact you do not assign the sample to a technical replicate. This is your samples.tsv:

sample  assembly        technical_replicates
SRR11692131     hg38

But perhaps something like this works?

sample  assembly        technical_replicates
SRR11692131     hg38    myfirsttechnicalrep

EDIT: I just saw @JGASmits comment. Ignore my answer

inesmarais commented 1 year ago

@JGASmits: I used two fastq files generated by sci-ATAC-seq as input files for the pipeline to ultimately generate the bam file. The other bam file (generated by 10X) was only added as a comparison.

@Maarten-vd-Sande Does this mean that sci-ATAC-seq fastq input files do not work for the current pipeline?

Maarten-vd-Sande commented 1 year ago

Yes and no. The current implementation dates back to an ancient era where people did plate-based single-cell sequencing :sauropod: . We would then have a fastq file per cell. The current implementation expects this.

To make seq2science work with sciatac, you would have to split the fastq file into per-cell fastqs. Then probably it would work. Would I recommend this? not really! Probably easier to look for a workflow that directly supports this, or do the analysis old-school by hand. Perhaps 10x/cellranger already has a workflow specific for your needs ready to go?

JGASmits commented 1 year ago

Unless you have your data of cells as individual fastq files (with each cell having a sample file entry), the current scATAC pipeline does not work. It currently succesfully treats your BAM file as a single cell (and generates a snapfile containing this single cell).

I think snapATAC also relatively straightforward supports 10x to snapfile methods (outside of seq2science). See https://github.com/r3fang/SnapATAC/wiki/FAQs#10X_snap

Perhaps also snapATAC 2.0 has some other guidlines for integrating 10x data.

inesmarais commented 1 year ago

Thanks for both of your quick answers, it is clear to me now! Unfortunately sci-atac input is not directly supported by 10X, then we will look for an alternative!

Cheers,

Inès