r3fang / SnapATAC

Analysis Pipeline for Single Cell ATAC-seq
GNU General Public License v3.0
300 stars 125 forks source link

Error using snap-add-pmat #154

Closed rhodesch closed 4 years ago

rhodesch commented 4 years ago

I have been following the vignette and had no problems until the peak calling steps. I am having an error following macs peak calling, and found a related issue here: Error adding Pmat to SnapATAC object #128

Following runMacs() I get the narrowPeak file format: b'chr1' 3094548 3095132 ./macs/adult.22_peak_1 58 . 6.42857 9.29728 5.87488 469

I changed all chromosomes from b'chr' to chr with sed (and a gsub step in R): sed -i "s/b'//;s/'//" *.narrowPeak

chr1 3094548 3095132 ./macs/adult.22_peak_1 58 . 6.42857 9.29728 5.87488 469

confirming chromosome levels in R: peak.gr = reduce(Reduce(c, peak.gr.ls)); levels(seqnames(peak.gr))

[1] "chr1" "chr10" "chr11" "chr12"
[5] "chr13" "chr14" "chr15" "chr16"
[9] "chr17" "chr18" "chr19" "chr2"
[13] "chr3" "chr4" "chr4_GL456216_random" "chr4_JH584295_random" [17] "chr5" "chr6" "chr7" "chr8"
[21] "chr9" "chrM" "chrUn_GL456359" "chrUn_GL456366"
[25] "chrUn_JH584304" "chrX" "chrX_GL456233_random" "chrY"
[29] "chrUn_GL456387" "chrUn_GL456393" "chrUn_GL456239" "chrUn_GL456360"

I then try to add the narrowPeaks to snap file: snaptools snap-add-pmat \ --snap-file path/to/snap \ --peak-file path/to/peaks.combined.bed

which give me warning: snap name: CTX_p32_Dec.possorted_bam.snap ***** WARNING: File /tmp/tmp3ik8sr2b has inconsistent naming convention for record: b'chr11' 54428162 54428289 AAACGAAAGAACCATA-1

The above start and end coordinates do not exist in the peaks.combined.bed file. The pattern "b'chr11'" also does not seem to be in the 10X cellranger-atac position sorted.bam output used to create the original snap file.

Then when I add the peak matrix in R: x.sp = addPmatToSnap(x.sp)

I get the error: Epoch: reading cell-peak count matrix session ... Error in value[3L] : Warning @readSnap: 'PM/idx' not found in /gpfs/gsfs9/users/BSPC/Analysis/atac_rna_wt/adult/generate_snap_files/output/snap/Dec.possorted_bam.snap

Based on this, what is causing the snaptools warning (error?)? Where is the pattern b'chr11' coming from in the record? How can I correct this so snap-add-pmat and addPmatToSnap work as expected?

rhodesch commented 4 years ago

Quick follow up, this was evidently caused by package version conflicts on my HPC cluster. After installing snaptools, python/2.7 and dependencies into a new miniconda environment everything worked as expected.

Brawni commented 4 years ago

Im having the same issue, tried to create another conda environment with python 2.7, pysam, umap-learn, pybedtools and snaptools but i still get it after running runMacs. Any advice?

Thanks!

rdalbanus commented 4 years ago

I get the same error when using a 3-column bed file generated externally (not using runMACS). Also getting that b'chr warning in snap-add-pmat, which makes me think that this function is the culprit.

maggiebr0wn commented 4 years ago

I am also getting the same error. Instead of running the MACS2 in a temporary dir, I specified the directory the files were output into and used sed to remove the b and ' from the bed file, however when I go back to try and add the cell-by-peak matrix with addPmatToSnap() I'm getting:

Epoch: reading cell-peak count matrix session ... Error in value[3L] : Warning @readSnap: 'PM/idx' not found in /Users/maggiebrown/Dropbox (GaTech)/scATACseq_practice/SnapATAC/ASCs/ascs_practice_hg38.snap

rdalbanus commented 4 years ago

@mfisher1995 I see that folks are still having this issue. From what I investigated, it's due to some conflict in the hd5 libraries. I have a singularity container that works fine. You can create the container using the Singularity file at https://github.com/rdalbanus/bioinf545_w20_snATAC. It's made for a singularity 2.5.2, if I'm not mistaken. Alternatively, you can create a conda environment and follow the steps outlined in the %post section in the Singularity file. Hope this helps!

maggiebr0wn commented 4 years ago

Hello, thanks for your response.

I am unable to install singularity on my PC, after spending too much time on it I’ve decided this work around is not worth doing to try and get the hd5 libraries conflict resolved. Also, on our remote computer I do most of my computational work on we are not allowed to use containers. Too bad.

Thanks anyways, if you have this resolved without a container I’d love to try it again.

On May 27, 2020, at 5:42 PM, rdalbanus notifications@github.com wrote:

@mfisher1995 https://github.com/mfisher1995 I see that folks are still having this issue. From what I investigated, it's due to some library conflict in the hd5 libraries. I have a singularity container that works fine. You can create the container using the Singularity file at https://github.com/rdalbanus/bioinf545_w20_snATAC https://github.com/rdalbanus/bioinf545_w20_snATAC. Alternatively, you can create a conda environment and follow the steps outlined in the %post section in the Singularity file. Hope this helps!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/r3fang/SnapATAC/issues/154#issuecomment-634957619, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIYMM4GCVFLRECY6LCGZTL3RTWCMVANCNFSM4KNQV32A.

fkeramati commented 4 years ago

Hi,

I was also having the same problem. I am using python 3.7.7 and R 4.0.1. I could solve the problem with a small change in both generated peak file after macs and the source code of "add_pmat.py". In the source code between lines 152 and 153 I added the following chunk:

import pandas as pd df = pd.read_csv(fout_frag.name,sep="\t",header=None) df.loc[:,0] = df.loc[:,0].str[2:-1] df.to_csv(fout_frag.name,sep="\t",header=False,index=False)

This removes the "b'" from the temprorary fragment file generated. As I said you also need to remove the "b'" from the generated peaks.combined.bed file. That is easily achievable within R after completing first part of step 14. Something like this should work: peaks.df$seqnames = substring(peaks.df$seqnames, 3) peaks.df$seqnames = substr(peaks.df$seqnames, 1, nchar(peaks.df$seqnames)-1) write.table(peaks.df,file = "peaks.combined.bed",append=FALSE, quote= FALSE,sep="\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = FALSE, qmethod = c("escape", "double"), fileEncoding = "")

I hope this solves the frustrating problem of being obliged to install older versions of R and Python to run snapATAC.

yanwengong commented 3 years ago

@mfisher1995 I see that folks are still having this issue. From what I investigated, it's due to some conflict in the hd5 libraries. I have a singularity container that works fine. You can create the container using the Singularity file at https://github.com/rdalbanus/bioinf545_w20_snATAC. It's made for a singularity 2.5.2, if I'm not mistaken. Alternatively, you can create a conda environment and follow the steps outlined in the %post section in the Singularity file. Hope this helps!

Hello! I have created a Conda environment and followed the steps outlined in the `%post. I wonder from which step should I re-process the data? Do I need to go back and redo "snaptools snap-pre" or just the "snaptools snap-add-pmat" step? Thanks for your help in advance.