perslab / CELLECT

CELLECT (CELL-type Expression-specific integration for Complex Traits)
GNU General Public License v3.0
71 stars 19 forks source link

Does not work with precomputed MOCA data #42

Closed lamdan2 closed 4 years ago

lamdan2 commented 4 years ago

Hello,

It worked perfectly with Tabula Muris and Mousebrain.

Next, I downloaded the precomputed MOCA data from https://github.com/perslab/CELLECT/wiki/Precomputed-CELLEX-datasets

The first errors I get are about illegal characters in the annotations:

Exception in line 211 of /home/daniel.lam/CELLECT/cellect-ldsc.snakefile: Illegal charecters in SPECIFICITY_INPUT=moca annotation names. Illegal charecters=[\s|__|/] File "/home/daniel.lam/CELLECT/cellect-ldsc.snakefile", line 211, in

I was able to resolve that by removing the illegal characters from the column headers

Next error was: Error in rule format_and_map_genes: jobid: 164 output: /home/daniel.lam/CELLECT/precomputation/moca/bed/moca.Shisa6positiveneurontrajectory.bed log: /home/daniel.lam/CELLECT/logs/log.format_and_map_snake.moca.Shisa6positiveneurontrajectory.txt (check log file(s) for error message) conda-env: /home/daniel.lam/CELLECT/.snakemake/conda/24b048b2

Traceback (most recent call last): File "/home/daniel.lam/CELLECT/.snakemake/scripts/tmp488vow72.format_and_map_snake.py", line 111, in multi_gene_sets_to_dict_of_beds(df_multi_gene_set_human, df_gene_coords, windowsize, bed_out_dir + '/tmp', bed_out_dir, out_prefix) File "/home/daniel.lam/CELLECT/.snakemake/scripts/tmp488vow72.format_and_map_snake.py", line 92, in multi_gene_sets_to_dict_of_beds bed_for_annot = pybedtools.BedTool(list_of_lists).sort().merge(c=[4,5], o=["distinct","max"]) File "/home/daniel.lam/CELLECT/.snakemake/conda/24b048b2/lib/python3.6/site-packages/pybedtools/bedtool.py", line 917, in decorated result = method(self, *args, **kwargs) File "/home/daniel.lam/CELLECT/.snakemake/conda/24b048b2/lib/python3.6/site-packages/pybedtools/bedtool.py", line 401, in wrapped decode_output=decode_output, File "/home/daniel.lam/CELLECT/.snakemake/conda/24b048b2/lib/python3.6/site-packages/pybedtools/helpers.py", line 455, in call_bedtools raise BEDToolsError(subprocess.list2cmdline(cmds), stderr) pybedtools.helpers.BEDToolsError: Command was:

bedtools merge -o distinct,max -i /home/daniel.lam/CELLECT/precomputation/moca/bed/tmp/pybedtools.vn17_95l.tmp -c 4,5

Error message was:


***** ERROR: Requested column 4, but database file /home/daniel.lam/CELLECT/precomputation/moca/bed/tmp/pybedtools.vn17_95l.tmp only has fields 1 - 0.

Any ideas? seems like something with the format of the data. Did it work for anyone else?

Thanks, Daniel

Tobi1kenobi commented 4 years ago

Hi Daniel,

Thanks for pointing this out. There's a mistake in the uploaded MOCA CELLEX dataset - it has HGNC gene names rather than the required Ensembl human names. We're working on fixing this and reuploading the correctly processed dataset.

Apologies for the unclear error message, this is a known issue and we'll hopefully get a fix out soon.

All the best, Tobi

liubovpashkova commented 4 years ago

Hi again, Daniel,

Please find the updated ESMU files with Ensembl human gene names and corrected CELLECT-style annotations uploaded instead of the old ones. I personally tested them with CELLECT and got reasonable prioritization results for several GWAS.

All the best, Liubov

lamdan2 commented 4 years ago

Hi again, Daniel,

Please find the updated ESMU files with Ensembl human gene names and corrected CELLECT-style annotations uploaded instead of the old ones. I personally tested them with CELLECT and got reasonable prioritization results for several GWAS.

All the best, Liubov

You guys are stars, thank you!!