perslab / CELLECT

CELLECT (CELL-type Expression-specific integration for Complex Traits)
GNU General Public License v3.0
71 stars 19 forks source link

Error in rule compute_LD_scores: #43

Closed willrosenow closed 4 years ago

willrosenow commented 4 years ago

Hello,

I am trying to run the example from CELLECT using the following command:

snakemake --use-conda -j -s cellect-ldsc.snakefile --configfile example/config-ldsc_example.yml

The job runs for a while, but then produces the following error:

Error in rule compute_LD_scores: jobid: 184 output: /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/CELLECT-LDSC-EXAMPLE/precomputation/tabula_muris-test/tabula_muris-test.COMBINED_ANNOT.1.l2.ldscore.gz, /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/CELLECT-LDSC-EXAMPLE/precomputation/tabula_muris-test/tabula_muris-test.COMBINED_ANNOT.1.l2.M, /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/CELLECT-LDSC-EXAMPLE/precomputation/tabula_muris-test/tabula_muris-test.COMBINED_ANNOT.1.l2.M_5_50, /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/CELLECT-LDSC-EXAMPLE/precomputation/tabula_muris-test/tabula_muris-test.COMBINED_ANNOT.1.log log: /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/CELLECT-LDSC-EXAMPLE/logs/log.compute_LD_scores.tabula_muris-test.1.txt (check log file(s) for error message) conda-env: /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/.snakemake/conda/8cf508d1 shell: /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/ldsc/ldsc.py --l2 --bfile /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/data/ldsc/1000G_EUR_Phase3_plink/1000G.EUR.QC.1 --ld-wind-cm 1 --annot /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/CELLECT-LDSC-EXAMPLE/precomputation/tabula_muris-test/tabula_muris-test.COMBINED_ANNOT.1.annot.gz --thin-annot --out /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/CELLECT-LDSC-EXAMPLE/precomputation/tabula_muris-test/tabula_muris-test.COMBINED_ANNOT.1 --print-snps /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/data/ldsc/print_snps.txt &> /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/CELLECT-LDSC-EXAMPLE/logs/log.compute_LD_scores.tabula_muris-test.1.txt

Any idea what might be causing this?

Thanks, Will

Tobi1kenobi commented 4 years ago

Hi Will,

Would you be able to show the contents of the log file referenced in your error /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/CELLECT-LDSC-EXAMPLE/logs/log.compute_LD_scores.tabula_muris-test.1.txt? It's difficult to be certain without seeing that.

However, if I had to guess I would think that it's because you don't have the LD score regression submodule, it should be here /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/ldsc/. This issue arises when you clone the repo without using --recurse-submodules.

If you delete your CELLECT installation and reinstall with git clone --recurse-submodules https://github.com/perslab/CELLECT.git that should fix the error. If not, post the contents of the log file and I'll try my best to give a better diagnosis.

Best, Tobi

willrosenow commented 4 years ago

Hi Tobi,

Thanks for the quick response. Here is the log file.

Thanks, Will

On Thu, Feb 27, 2020 at 6:19 PM Tobi notifications@github.com wrote:

Hi Will,

Would you be able to show the contents of the log file referenced in your error /sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/CELLECT-LDSC-EXAMPLE/logs/log.compute_LD_scores.tabula_muris-test.1.txt

Best, Tobi

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/perslab/CELLECT/issues/43?email_source=notifications&email_token=ALLW6FRIUYLX3746HMY5DLDRFBDB5A5CNFSM4K47XSB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENGKZMY#issuecomment-592227507, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALLW6FWP6Z2WBSXOGMPJWWLRFBDB5ANCNFSM4K47XSBQ .

Traceback (most recent call last): File "/sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/ldsc/ldsc.py", line 23, in import ldscore.parse as ps File "/sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/ldsc/ldscore/parse.py", line 10, in import pandas as pd File "/sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/.snakemake/conda/8cf508d1/lib/python2.7/site-packages/pandas/init.py", line 59, in from pandas.util._tester import test File "/sfs/lustre/bahamut/scratch/wr8yp/CELLECT-3/CELLECT/.snakemake/conda/8cf508d1/lib/python2.7/site-packages/pandas/util/_tester.py", line 11, in import pytest File "/home/wr8yp/.local/lib/python2.7/site-packages/pytest.py", line 10, in from _pytest.fixtures import fixture, yield_fixture File "/home/wr8yp/.local/lib/python2.7/site-packages/_pytest/fixtures.py", line 8, in from more_itertools import flatten File "/home/wr8yp/.local/lib/python2.7/site-packages/more_itertools/init.py", line 1, in from more_itertools.more import # noqa File "/home/wr8yp/.local/lib/python2.7/site-packages/more_itertools/more.py", line 340 def _collate(iterables, key=lambda a: a, reverse=False): ^ SyntaxError: invalid syntax

Tobi1kenobi commented 4 years ago

Hi Will,

That log message is super helpful! It seems to be a bug in an old version of pytest which your system python has installed.

As a quick-fix I'd suggest updating your system python 2.7 /home/wr8yp/.local/lib/python2.7/ to ensure it has pytest version >=4.2.1. Let me know how that pans out.

Snakemake's use of conda should ensure that all packages are tightly controlled within the conda environments but for some reason this isn't the case here. This is something we will need to look into and fix so thank you for bringing it to our attention!

Best, Tobi

willrosenow commented 4 years ago

Hi Tobi,

Thanks! That fixed the problem and now the tutorial data runs correctly.

However, I am still getting an error with my data using the same environment. Here is the following error with the log file:


***** ERROR: Requested column 4, but database file /scratch/wr8yp/CELLECT_test/precomputation/SCRNA/bed/tmp/pybedtools.65f2cdgd.tmp only has fields 1 - 0.

Traceback (most recent call last): File "/sfs/lustre/bahamut/scratch/wr8yp/CELLECT_test/.snakemake/scripts/ tmpp9r9q9eb.format_and_map_snake.py", line 111, in multi_gene_sets_to_dict_of_beds(df_multi_gene_set_human, df_gene_coords, windowsize, bed_out_dir + '/tmp', bed_out_dir, out_prefix) File "/sfs/lustre/bahamut/scratch/wr8yp/CELLECT_test/.snakemake/scripts/ tmpp9r9q9eb.format_and_map_snake.py", line 92, in multi_gene_sets_to_dict_of_beds bed_for_annot = pybedtools.BedTool(list_of_lists).sort().merge(c=[4,5], o=["distinct","max"]) File "/sfs/lustre/bahamut/scratch/wr8yp/CELLECT_test/.snakemake/conda/d964a65b/lib/python3.6/site-packages/pybedtools/bedtool.py", line 917, in decorated result = method(self, *args, **kwargs) File "/sfs/lustre/bahamut/scratch/wr8yp/CELLECT_test/.snakemake/conda/d964a65b/lib/python3.6/site-packages/pybedtools/bedtool.py", line 401, in wrapped decode_output=decode_output, File "/sfs/lustre/bahamut/scratch/wr8yp/CELLECT_test/.snakemake/conda/d964a65b/lib/python3.6/site-packages/pybedtools/helpers.py", line 455, in call_bedtools raise BEDToolsError(subprocess.list2cmdline(cmds), stderr) pybedtools.helpers.BEDToolsError: Command was:

bedtools merge -o distinct,max -i /scratch/wr8yp/CELLECT_test/precomputation/SCRNA/bed/tmp/pybedtools.so_mzpxn.tmp -c 4,5

Error message was:


***** ERROR: Requested column 4, but database file /scratch/wr8yp/CELLECT_test/precomputation/SCRNA/bed/tmp/pybedtools.so_mzpxn.tmp only has fields 1 - 0.

[Mon Mar 2 11:06:32 2020] Error in rule format_and_map_genes: jobid: 122 output: /scratch/wr8yp/CELLECT_test/precomputation/SCRNA/bed/SCRNA.cluster_6.bed log: /scratch/wr8yp/CELLECT_test/logs/log.format_and_map_snake.SCRNA.cluster_6.txt (check log file(s) for error message) conda-env: /sfs/lustre/bahamut/scratch/wr8yp/CELLECT_test/.snakemake/conda/d964a65b

On Sat, Feb 29, 2020 at 5:16 PM Tobi notifications@github.com wrote:

Hi Will,

That log message is super helpful! It seems to be a bug in an old version https://github.com/pytest-dev/pytest/issues/4770 of pytest which your system python has installed.

As a quick-fix I'd suggest updating your system python 2.7 /home/wr8yp/.local/lib/python2.7/ to ensure it has pytest version

=4.2.1. Let me know how that pans out.

Snakemake's use of conda should ensure that all packages are tightly controlled within the conda environments but for some reason this isn't the case here. This is something we will need to look into and fix so thank you for bringing it to our attention!

Best, Tobi

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/perslab/CELLECT/issues/43?email_source=notifications&email_token=ALLW6FRT5BKFYVA6X5CIJITRFGEMBA5CNFSM4K47XSB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENMHNMQ#issuecomment-593000114, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALLW6FSVXSSP3RR757XFUQDRFGEMBANCNFSM4K47XSBQ .

Read file_multi_gene_set. Header of the parsed/processed file: annotation gene_input annotation_value 0 cluster_0 ENSMUSG00000042501 0.117367 1 cluster_0 ENSMUSG00000025935 0.007380 2 cluster_0 ENSMUSG00000043716 0.184809 3 cluster_0 ENSMUSG00000042686 0.207976 4 cluster_0 ENSMUSG00000025776 0.211980 5 cluster_0 ENSMUSG00000026155 0.005292 6 cluster_0 ENSMUSG00000037470 0.274044 7 cluster_0 ENSMUSG00000037447 0.517148 8 cluster_0 ENSMUSG00000102762 0.714600 9 cluster_0 ENSMUSG00000037408 0.027190 Annotation value summary stats: mean std max min count annotation
cluster_0 0.290902 0.237080 0.942749 0.000308 1797 cluster_1 0.265348 0.219889 0.988977 0.000196 2142 cluster_10 0.332255 0.286764 0.994958 0.000150 1816 cluster_2 0.190460 0.153130 0.932407 0.000107 3378 cluster_3 0.267904 0.233141 0.991214 0.000064 4964 cluster_4 0.275538 0.252419 0.988458 0.000065 4322 cluster_5 0.268200 0.209487 0.994913 0.000045 6147 cluster_6 0.240925 0.223557 0.992075 0.000134 4270 cluster_7 0.309604 0.281136 0.971880 0.000114 2439 cluster_8 0.281979 0.258106 0.991851 0.000125 2228 cluster_9 0.271125 0.242419 0.961182 0.000589 1000 ========================== STATS file_multi_gene_set ==================== Number of gene sets: 11

Making gene set bed files Failed setting pybedtools tempdir to /scratch/wr8yp/CELLECT_test/precomputation/SCRNA/bed/tmp. Will use standard tempdir /tmp Merging input multi gene set with gene coordinates for annotation = cluster_0

Tobi1kenobi commented 4 years ago

Hi Will,

From the logfile it seems like you're providing CELLECT with mouse genes. CELLECT only takes Ensembl human genes as input.

If you used CELLEX to compute expression specificity for each gene, make sure to also map Ensembl mouse genes to Ensembl human orthologues.

Hopefully with that, everything should run smoothly. If not, let me know.

All the best, Tobi

willrosenow commented 4 years ago

That worked, Thank you!

On Mon, Mar 2, 2020 at 11:42 AM Tobi notifications@github.com wrote:

Hi Will,

From the logfile it seems like you're providing CELLECT with mouse genes. CELLECT only takes Ensembl human genes as input.

If you used CELLEX to compute expression specificity for each gene, make sure to also map Ensembl mouse genes to Ensembl human orthologues https://github.com/perslab/CELLEX#pro-tip-using-cellex-with-cellect.

Hopefully with that, everything should run smoothly. If not, let me know.

All the best, Tobi

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/perslab/CELLECT/issues/43?email_source=notifications&email_token=ALLW6FRV2IF7KAFD5ML53U3RFPOXHA5CNFSM4K47XSB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENQAVOI#issuecomment-593496761, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALLW6FX3YL3MRPKNLTHEFJDRFPOXHANCNFSM4K47XSBQ .