mdozmorov / scATAC-seq_notes

scATAC-seq data analysis tools and papers
MIT License
88 stars 17 forks source link

inquiry about Loading scATAC-seq matrices into R #1

Closed jiangzh-coder closed 1 year ago

jiangzh-coder commented 1 year ago

Hi

i want to analyze some scATAC-seq data. And after unzip, i got 10 folders (1 patient per folder). In folder, there are 2 subfolders in whcih scRNAseq data and scATAC-seq data exist seperately. Within these 2 subfolders, there are files generated by cell ranger ( i attached 2 pic.) How may i loading scATAC-seq matrices as well as scRNAseq data into R? Could you kindly provide some codes?

image

jiangzh-coder commented 1 year ago

jzhou@jiang:/mnt/d/##files/#atac/HD2$ tail atac_peaks.bed GL000218.1 83275 84106 KI270726.1 27131 28058 KI270726.1 41490 42368 KI270711.1 7979 8731 KI270711.1 8887 9380 KI270713.1 15800 16459 KI270713.1 17290 18022 KI270713.1 21445 22340 KI270713.1 32734 33486 KI270713.1 36862 37780

jzhou@jiang:/mnt/d/##files/#atac/HD2$ head atac_peak_annotation.tsv chrom start end gene distance peak_type chr1 9778 10667 MIR1302-2HG -18887 distal chr1 180732 181004 AL627309.5 -6871 distal chr1 181116 181809 AL627309.5 -7255 distal chr1 183935 184770 AL627309.5 -10074 distal chr1 191103 192028 AL627309.5 -17242 distal chr1 267627 268482 AP006222.2 773 distal chr1 629498 630379 AC114498.1 41870 distal chr1 633577 634503 AC114498.1 45949 distal chr1 778280 779196 LINC01409 0 promoter

jzhou@jiang:/mnt/d/##files/#atac/HD2$ tail -5 atac_fragments.tsv KI270713.1 39073 39449 TAAGTAGCACAGGATG-1 1 KI270713.1 39075 39208 GCTTAACAGTTCCCGT-1 2 KI270713.1 39697 39805 TGTTATGAGGGCTTTG-1 2 KI270713.1 40366 40677 GACCTAAGTTCCGGCT-1 2 KI270713.1 40639 40708 GGATGGCCACACCAAC-1 2

jiangzh-coder commented 1 year ago

Hello,

i analyzed upstream data in this dataset (GSE199994) . i download SRA file from EBI, then i change then into fastq file, then i transform their names to standard name and run cellranger_atac. i got error as following:

2.7% (< 10%) of read pairs have a valid 10x barcode. This could be a result of poor sequencing quality, a sample mixup, or running the wrong pipeline, for example, running cellranger-atac on Multiome ATAC + GEX data, or vice versa.

The whole code is as following :

ascp -QT -l 300m -P33001 -i ~/miniconda3/envs/my10x/etc/asperaweb_id_dsa.openssh \era-fasp@fasp.sra.ebi.ac.uk:/vol1/srr/SRR186/006/SRR18613306 .

mv SRR18613306 SRR18613306.sra parallel-fastq-dump -t 12 -O ./ --split-files --gzip -s SRR18613306.sra

mv *.fastq.gz /home/ubuntu/GSE199994/scATAC/2.raw_fastq

mv SRR18613295_1.fastq.gz SRR18613295-P5_S9_L001_I1_001.fastq.gz mv SRR18613295_2.fastq.gz SRR18613295-P5_S9_L001_R1_001.fastq.gz mv SRR18613295_3.fastq.gz SRR18613295-P5_S9_L001_R2_001.fastq.gz mv SRR18613295_4.fastq.gz SRR18613295-P5_S9_L001_R3_001.fastq.gz

cellranger-atac count --id=SRR18613295-P5 \ --reference=/home/ubuntu/biosoftware/refdata-cellranger-arc-GRCh38-2020-A-2.0.0 \ --fastqs=/home/ubuntu/GSE199994/scATAC/2.raw_fastq \ --sample=SRR18613295-P5 \ --localcores=24 \ --localmem=96

could you help me?

mdozmorov commented 1 year ago

Please, join the R/Bioconductor community and ask there. https://bioc-community.herokuapp.com/