martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
110 stars 15 forks source link

Segmentation fault when Running scDRS #33

Closed angelussong closed 2 years ago

angelussong commented 2 years ago

Hi,

Sorry to bug you with this. I am trying to run scDRS on our scRNA-seq dataset.

The h5ad file I'm using has 12755 cells from both normal and tumor patients. I generated this file by converting from the corresponding Seurat object. I was able to visualize this h5ad object in cellxgene and confirmed that it has all the metadata correctly.

For the .gs file I just used the one included in the example dataset magma_10kb_top1000_zscore.74_traits.rv1.gs.

Here are the commands I used for running scDRS (I think I installed the packages correctly):

python compute_score.py \ --h5ad_file ./Tumor_Seqwell.h5ad\ --h5ad_species human\ --gs_file magma_10kb_top1000_zscore.74_traits.rv1.gs\ --gs_species human\ --flag_filter True\ --flag_raw_count True\ --n_ctrl 1000\ --flag_return_ctrl_raw_score False\ --flag_return_ctrl_norm_score True\ --out_folder ./Tumor_Seqwell

And here is the output I'm seeing:


Load data: /Users/hsong/opt/anaconda3/envs/scDRS_env/lib/python3.9/site-packages/anndata/compat/init.py:232: FutureWarning: Moving element from .uns['neighbors']['distances'] to .obsp['distances'].

This is where adjacency matrices should go now. warn( /Users/hsong/opt/anaconda3/envs/scDRS_env/lib/python3.9/site-packages/scanpy/preprocessing/_simple.py:352: RuntimeWarning: invalid value encountered in log1p np.log1p(X, out=X) --h5ad_file loaded: n_cell=2760, n_gene=1258 (sys_time=6.1s) --gs_file loaded: n_geneset=74 (sys_time=6.3s) ./scDRS_step1.sh: line 11: 57097 Segmentation fault: 11 python compute_score.py --h5ad_file ./Tumor_Seqwell.h5ad --h5ad_species human --gs_file magma_10kb_top1000_zscore.74_traits.rv1.gs --gs_species human --flag_filter True --flag_raw_count True --n_ctrl 1000 --flag_return_ctrl_raw_score False --flag_return_ctrl_norm_score True --out_folder ./Tumor_Seqwell

The first thing I noticed besides the error message is that the number of cells does not match what I have in the dataset, nor does the number of genes. I am wondering what is the cause of that, and of course the error.

Thank you very much for your time!

Angelus

martinjzhang commented 2 years ago

Hi Angelus,

It seems you are using the python scripts which are deprecated. Could you try to rerun the analysis using the scDRS CLI and let us know how it goes? https://martinjzhang.github.io/scDRS/reference_cli.html#compute-score

Thank you, Martin

angelussong commented 2 years ago

Hi Martin,

Thanks so much for your reply!

I tried the CLI as you instructed but now I got a different error:

This is where adjacency matrices should go now.

warn(

Traceback (most recent call last):

File "/Users/hsong/opt/anaconda3/envs/scDRS_env/bin/scdrs", line 740, in

fire.Fire() File "/Users/hsong/opt/anaconda3/envs/scDRS_env/lib/python3.9/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/Users/hsong/opt/anaconda3/envs/scDRS_env/lib/python3.9/site-packages/fire/core.py", line 466, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/Users/hsong/opt/anaconda3/envs/scDRS_env/lib/python3.9/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/Users/hsong/opt/anaconda3/envs/scDRS_env/bin/scdrs", line 161, in compute_score adata = scdrs.util.load_h5ad( File "/Users/hsong/opt/anaconda3/envs/scDRS_env/lib/python3.9/site-packages/scdrs/util.py", line 75, in load_h5ad raise ValueError( ValueError: h5ad expression matrix should not contain negative values. This is because in the preprocessing step, scDRS models the gene-level log mean-variance relationship. See scdrs.pp.compute_stats for details. This is kinda confusing to me because in theory the active assay from my original Seurat object would be the RNA "data" slot so it should be the log normalized data so everything should be positive. I'm wondering how I could resolve this? Thank you! Angelus On Thu, Sep 29, 2022 at 4:38 PM Martin Jinye Zhang ***@***.***> wrote: > Hi Angelus, > > It seems you are using the python scripts which are deprecated. Could you > try to rerun the analysis using the scDRS CLI and let us know how it goes? > https://martinjzhang.github.io/scDRS/reference_cli.html#compute-score > > Thank you, > Martin > > — > Reply to this email directly, view it on GitHub > , > or unsubscribe > > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >
martinjzhang commented 2 years ago

Hi Angelus,

If you are using log normalized data (size factor + log1p), you need to turn off the raw count flag flag_raw_count False and the filtering flag --flag_filter False. Maybe the filtering flag is the reason that scDRS removed many genes/cells.

Also, please double-check that the expression matrix doesn't contain negative values.

Best, Martin

angelussong commented 2 years ago

Hi Martin,

Thanks for your advice!!

I re-converted the object and made sure that I'm not using the scaled data but the log normalized data with no negative values. I think now it's loading the object fine and running!!

Thanks again for your time!

Angelus

On Thu, Sep 29, 2022 at 5:16 PM Martin Jinye Zhang @.***> wrote:

Hi Angelus,

If you are using log normalized data (size factor + log1p), you need to turn off the raw count flag flag_raw_count False and the filtering flag --flag_filter False. Maybe the filtering flag is the reason that scDRS removed many genes/cells.

Also, please double-check that the expression matrix doesn't contain negative values.

Best, Martin

— Reply to this email directly, view it on GitHub https://github.com/martinjzhang/scDRS/issues/33#issuecomment-1262959142, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJVRAGQ5TB2BFHDVQGPXNJLWAYWMLANCNFSM6AAAAAAQZC3PXQ . You are receiving this because you authored the thread.Message ID: @.***>