Closed sylestiel closed 3 years ago
Hi,
Sorry about the error. Instead of running stream_atac
, can you try to run the R script run_preprocess.R directly in the same environment?
Rscript ./run_preprocess.R -c ./filtered_peak_bc_matrix/matrix.mtx -r ./filtered_peak_bc_matrix/peaks.bed -s ./filtered_peak_bc_matrix/barcodes.tsv --file_format mtx -g mm10 -f motif --n_jobs 3 -o stream_output
It might be something related to the package rpy2
.
Also to speed up the whole procedure, you can try to increase n_jobs
. STREAM internally calls chromVAR
to get the zscore matrix. Just for your reference, in our previous benchmark study, with ~5k cells and 44 cpus, the part takes ~30 mins
Thank you! I will give it a try.
You wrote : _can you try to run the R script runpreprocess.R directly in the same environment?
Is this in Terminal or in Jupyter Notebook. Need more clarification.
Rscript ./run_preprocess.R -c ./filtered_peak_bc_matrix/matrix.mtx -r ./filtered_peak_bc_matrix/peaks.bed -s ./filtered_peak_bc_matrix/barcodes.tsv --file_format mtx -g mm10 -f motif --n_jobs 3 -o stream_output
needs to be run in your terminal. You can simply replace your stream_atac command line with the Rscript command line.
So do I start of by conda activate myenv R Rscript ./run_preprocess.R -c ./filtered_peak_bc_matrix/matrix.mtx -r ./filtered_peak_bc_matrix/peaks.bed -s ./filtered_peak_bc_matrix/barcodes.tsv --file_format mtx -g mm10 -f motif --n_jobs 3 -o stream_output
First, you need to download the script run_preprocess.R to your local machine (e.g. under the directory ~/your_workdir
Then in your terminal,
$conda activate myenv
$Rscript ~/your_workdir/run_preprocess.R -c ./filtered_peak_bc_matrix/matrix.mtx -r ./filtered_peak_bc_matrix/peaks.bed -s ./filtered_peak_bc_matrix/barcodes.tsv --file_format mtx -g mm10 -f motif --n_jobs 3 -o stream_output
You are all set!
$ Rscript /Volumes/BKUP2/R_projects/Stream/run_preprocess.R -c /Volumes/BKUP2/scATAC_data/72020_scATAC/SH2_E165/outs/filtered_peak_bc_matrix/matrix.mtx -r /Volumes/BKUP2/scATAC_data/72020_scATAC/SH2_E165/outs/filtered_peak_bc_matrix/peaks.bed -s /Volumes/BKUP2/scATAC_data/72020_scATAC/SH2_E165/outs/filtered_peak_bc_matrix/barcodes.tsv --file_format mtx -g mm10 -f motif --n_jobs 3 -o stream_output
Execution halted
Can you catch the error here?
Sorry I am not sure about this error. I gave it a try on an example data. it works well on my machine.
It is working. I downloaded the wrong file previously.
Awesome. once it's finished, you can run the following code snippet to read it into STREAM-compatible object
import pandas as pd
import anndata as ad
from sklearn import preprocessing
import stream as st
df_zscores = pd.read_csv('zscores.tsv.gz',sep='\t',index_col=0)
df_zscores_scaled = preprocessing.scale(df_zscores,axis=1)
df_zscores_scaled = pd.DataFrame(df_zscores_scaled,index=df_zscores.index,columns=df_zscores.columns)
adata = ad.AnnData(X=df_zscores_scaled.values.T, obs={'obs_names':df_zscores_scaled.columns},var={'var_names':df_zscores_scaled.index})
st.set_workdir(adata,'./stream_result')
As I showed above, you need to import several other libraries:
import pandas as pd
import anndata as ad
from sklearn import preprocessing
import stream as st
Hi,
A new error:
No adata file in the stream_results folder!!! Suggestions?
Awesome. once it's finished, you can run the following code snippet to read it into STREAM-compatible object
import pandas as pd import anndata as ad from sklearn import preprocessing import stream as st df_zscores = pd.read_csv('zscores.tsv.gz',sep='\t',index_col=0) df_zscores_scaled = preprocessing.scale(df_zscores,axis=1) df_zscores_scaled = pd.DataFrame(df_zscores_scaled,index=df_zscores.index,columns=df_zscores.columns) adata = ad.AnnData(X=df_zscores_scaled.values.T, obs={'obs_names':df_zscores_scaled.columns},var={'var_names':df_zscores_scaled.index}) st.set_workdir(adata,'./stream_result')
Instead of st.read()
, please use the above codes I mentioned before
It appears to be running but I don't see any file that is called adata within the stream_result folder. Is that to be expected?
You don't need to run the step4 and step7 in your notebook.
You can skip to step 8 in this tutorial
Thank you very much Huidong!
Hi,
I have tried several times to generate the adata.h5ad, zscores_scaled.tsv.gz, and zscores.tsv.gz files using command line and the following script and my cellranger output data:
$ stream_atac -c ./filtered_peak_bc_matrix/matrix.mtx -r ./filtered_peak_bc_matrix/peaks.bed -s ./filtered_peak_bc_matrix/barcodes.tsv --file_format mtx -g mm10 -f motif --n_jobs 3 -o stream_output
Although it worked with no problem for ~3K cells dataset it appears to going on endlessly for a dataset with >7K cells.
I let it run for a couple of weeks and then closed it. Can you suggest a way to expedite the generation of adata and zscore files.
Thank you!