Closed Dan-121 closed 1 year ago
Hi @dandata123-tech , it seems scDRS completed for the first two traits (SCZ & CEREV), each taking around 800 seconds. If this is true, the software should have output the .score.gz
and .full_score.gz
files for the first two traits. Could you confirm it? It is indeed weird that the software got stuck when processing the third trait, which should take around the same time to complete (~800s). We can look into it if you can provide a minimal reproducible example.
Hi, thanks for the in-time reply, I can get the output the .score.gz and .full_score.gz files for the first two traits, but got stuck when processing the third trait, and If I change the order of the gs file, I can get the first two traits two and get stuck in the third traits, It is ok when I run the example of our data.
Hi @dandata123-tech ,
I suspect that your .gs
file contains illegal values (such as NA or negative values for the gene weights). Please refer to https://martinjzhang.github.io/scDRS/file_format.html#gs for an example of the .gs
file.
As diagnostics, you can create 3 separate .gs
files for the 3 traits to see which one gives you the error. scDRS processes each trait independently, so running scDRS on the 3 separate .gs
files should not change the results.
Hi,thanks for your intime reply. I check the gs file and find that there is no illegal values and I try it on your sample gs. then I find something wrong if I run each trait independently.Here is the error.
Task exception was never retrieved
future: <Task finished name='Task-13' coro=<ScriptMagics.shebang.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/user/anaconda3/envs/dictys/lib/python3.10/site-packages/IPython/core/magics/script.py", line 213, in _handle_stream line = (await stream.readline()).decode("utf8") File "/home/user/anaconda3/envs/dictys/lib/python3.10/asyncio/streams.py", line 534, in readline raise ValueError(e.args[0]) ValueError: Separator is not found, and chunk exceed the limit
Could you please help with the problem? looking forward to your relay, thanks.
Hi @dandata123-tech
Thank you for following up. I am unable to identify the issue. The best way is to provide a minimal reproducible example. However, here are my guesses. The ValueError "ValueError: Separator is not found, and chunk exceed the limit" seems to indicate that scDRS couldn't parse the delimiters in your .gs
file (\t
or comma). Maybe it contains some non-English characters?
Hi @dandata123-tech
Thank you for following up. Great that you have identified the issue.
Your procedures look about right. You can refer to this post for using MAGMA.
Thank you.
Hi, thanks for developing such a helpful tool, but I have had some questions recently. When I run the compute-score process, the code can not finish. could you be so pleased to help me with the problem? Here are the code.
Call: scdrs compute-score \ --h5ad-file /data4/scDRS/data/cere/expr.h5ad \ --h5ad-species human \ --cov-file /data4/scDRS/data/cere/cov.tsv \ --gs-file /data4/scDRS/data/cere/processed_geneset.gs \ --gs-species human \ --ctrl-match-opt mean_var \ --weight-opt vs \ --adj-prop None \ --flag-filter-data True \ --flag-raw-count True \ --n-ctrl 1000 \ --flag-return-ctrl-raw-score False \ --flag-return-ctrl-norm-score True \ --out-folder /data4/scDRS/data/cere/out Loading data: --h5ad-file loaded: n_cell=62247, n_gene=23202 (sys_time=7.0s) First 3 cells: ['E083_AAACCCAAGGGCTGAT-1', 'E083_AAACCCACAGGCAATG-1', 'E083_AAACCCACAGTATACC-1'] First 5 genes: ['AL627309.1', 'AL627309.5', 'LINC01409', 'FAM87B', 'LINC01128'] --cov-file loaded: covariates=['const', 'n_genes', 'timepoint'] (sys_time=7.0s) First 5 values for 'const': [1, 1, 1, 1, 1] First 5 values for 'n_genes': [3861, 4883, 5453, 2459, 5002] First 5 values for 'timepoint': ['E083', 'E083', 'E083', 'E083', 'E083'] --gs-file loaded: n_trait=3 (sys_time=7.0s) Print info for first 3 traits: First 3 elements for 'SCZ': ['NRGN', 'DPYD', 'RBFOX1'], [7.6558, 7.6519, 7.3247] First 3 elements for 'CEREV': ['RNF11', 'CDKN2C', 'TRRAP'], [6.4221, 6.1533, 6.1347] First 3 elements for 'Height': ['WWOX', 'BNC2', 'GMDS'], [10.0, 10.0, 10.0]
Preprocessing: scdrs.pp.category2dummy: Detected categorical columns: timepoint. Added dummy columns: timepoint_E093,timepoint_E101,timepoint_E102,timepoint_E108,timepoint_E117. Dropped columns: timepoint.
Computing scDRS score: Trait=SCZ, n_gene=898: 165/62247 FDR<0.1 cells, 469/62247 FDR<0.2 cells (sys_time=839.1s) Trait=CEREV, n_gene=819: 0/62247 FDR<0.1 cells, 0/62247 FDR<0.2 cells (sys_time=1529.8s)
And the computer keeps running even 2 days after. Could you please help with the problem? looking forward to your relay, thanks.