Closed lamdan2 closed 4 years ago
Hi Daniel,
Thanks for pointing this out. There's a mistake in the uploaded MOCA CELLEX dataset - it has HGNC gene names rather than the required Ensembl human names. We're working on fixing this and reuploading the correctly processed dataset.
Apologies for the unclear error message, this is a known issue and we'll hopefully get a fix out soon.
All the best, Tobi
Hi again, Daniel,
Please find the updated ESMU files with Ensembl human gene names and corrected CELLECT-style annotations uploaded instead of the old ones. I personally tested them with CELLECT and got reasonable prioritization results for several GWAS.
All the best, Liubov
Hi again, Daniel,
Please find the updated ESMU files with Ensembl human gene names and corrected CELLECT-style annotations uploaded instead of the old ones. I personally tested them with CELLECT and got reasonable prioritization results for several GWAS.
All the best, Liubov
You guys are stars, thank you!!
Hello,
It worked perfectly with Tabula Muris and Mousebrain.
Next, I downloaded the precomputed MOCA data from https://github.com/perslab/CELLECT/wiki/Precomputed-CELLEX-datasets
The first errors I get are about illegal characters in the annotations:
Exception in line 211 of /home/daniel.lam/CELLECT/cellect-ldsc.snakefile: Illegal charecters in SPECIFICITY_INPUT=moca annotation names. Illegal charecters=[\s|__|/] File "/home/daniel.lam/CELLECT/cellect-ldsc.snakefile", line 211, in
I was able to resolve that by removing the illegal characters from the column headers
Next error was: Error in rule format_and_map_genes: jobid: 164 output: /home/daniel.lam/CELLECT/precomputation/moca/bed/moca.Shisa6positiveneurontrajectory.bed log: /home/daniel.lam/CELLECT/logs/log.format_and_map_snake.moca.Shisa6positiveneurontrajectory.txt (check log file(s) for error message) conda-env: /home/daniel.lam/CELLECT/.snakemake/conda/24b048b2
Traceback (most recent call last): File "/home/daniel.lam/CELLECT/.snakemake/scripts/tmp488vow72.format_and_map_snake.py", line 111, in
multi_gene_sets_to_dict_of_beds(df_multi_gene_set_human, df_gene_coords, windowsize, bed_out_dir + '/tmp', bed_out_dir, out_prefix)
File "/home/daniel.lam/CELLECT/.snakemake/scripts/tmp488vow72.format_and_map_snake.py", line 92, in multi_gene_sets_to_dict_of_beds
bed_for_annot = pybedtools.BedTool(list_of_lists).sort().merge(c=[4,5], o=["distinct","max"])
File "/home/daniel.lam/CELLECT/.snakemake/conda/24b048b2/lib/python3.6/site-packages/pybedtools/bedtool.py", line 917, in decorated
result = method(self, *args, **kwargs)
File "/home/daniel.lam/CELLECT/.snakemake/conda/24b048b2/lib/python3.6/site-packages/pybedtools/bedtool.py", line 401, in wrapped
decode_output=decode_output,
File "/home/daniel.lam/CELLECT/.snakemake/conda/24b048b2/lib/python3.6/site-packages/pybedtools/helpers.py", line 455, in call_bedtools
raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:
Error message was:
***** ERROR: Requested column 4, but database file /home/daniel.lam/CELLECT/precomputation/moca/bed/tmp/pybedtools.vn17_95l.tmp only has fields 1 - 0.
Any ideas? seems like something with the format of the data. Did it work for anyone else?
Thanks, Daniel