tariks / peakachu

Genome-wide contact analysis using sklearn
MIT License
57 stars 9 forks source link

Error with available model file #16

Closed maharshi14 closed 2 years ago

maharshi14 commented 2 years ago

Hello,

Thank you for creating this tool.

I'm attempting to run Peakachu on my H3K27ac HiChIP data with total number of intra-chromosomal pairs = 65634708. I've downloaded the H3K27ac 1.5% model file for this, the size of which is ~4.4M and is called down1.h3k27ac.pkl.

My .cool file is of resolution 10kb, which I created from a .hic file (using HiC-Pro), converted using hic2cool and then normalised using cooler balance.

After running peakachu score_chromosome -r 10000 --balance -p data.10kb.cool -O output -m down1.h3k27ac.pkl, I keep getting the following error messages pertaining to the model file:

Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2898, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'down1.h3k27ac'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/cooler/util.py", line 167, in parse_region clen = chromsizes[chrom] if chromsizes is not None else None File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/pandas/core/series.py", line 882, in getitem return self._get_value(key) File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/pandas/core/series.py", line 990, in _get_value loc = self.index.get_loc(label) File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2900, in get_loc raise KeyError(key) from err KeyError: 'down1.h3k27ac'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/3dgenome/bin/peakachu", line 4, in import('pkg_resources').run_script('peakachu==1.1.4', 'peakachu') File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/pkg_resources/init.py", line 651, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/pkg_resources/init.py", line 1448, in run_script exec(code, namespace, namespace) File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/peakachu-1.1.4-py3.6.egg/EGG-INFO/scripts/peakachu", line 80, in run() File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/peakachu-1.1.4-py3.6.egg/EGG-INFO/scripts/peakachu", line 76, in run args.func(args) File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/peakachu-1.1.4-py3.6.egg/peakachu/score_chromosome.py", line 41, in main X = scoreUtils.Chromosome(Lib.matrix(balance=args.balance, sparse=True).fetch(ccname).tocsr(), File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/cooler/core.py", line 573, in fetch i0, i1, j0, j1 = self._fetch(*args, **kwargs) File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/cooler/api.py", line 384, in _fetch region1 = parse_region(region, self._chromsizes) File "/home/ubuntu/miniconda3/envs/3dgenome/lib/python3.6/site-packages/cooler/util.py", line 169, in parse_region raise ValueError("Unknown sequence label: {}".format(chrom)) ValueError: Unknown sequence label: down1.h3k27ac

Any troubleshooting on this would be greatly appreciated. Thank you!

tariks commented 2 years ago

Thank you for posting this. Based on the last two lines with the ValueError, it looks like the fetch method was expecting a chromosome name and got a filename name instead. Please inspect the cool file output by HiC-Pro and see what the chromosome labels are - this may be an oversight on my part with expecting chromosome labels to be compatible with specific string manipulations. If it's easy to rename chromosomes to something like ['1','2',...] or ['chrom1','chrom2',...] then try that next.

I hope this helps!

XiaoTaoWang commented 2 years ago

Hi, my suggestion is running "score_genome" instead of "score_chromosome" when you are using the pre-trained models for prediction. In your case, the command should be:

peakachu score_genome -r 10000 -p data.10kb.cool --balance -O output -m down1.h3k27ac.pkl

Keep in mind that although the model is named as "down1.h3k27ac.pkl", it was trained on GM12878 Hi-C data using H3K27ac HiChIP interactions as positive training set. You can have a try, but we cannot guarantee the prediction quality when you apply it to predict chromatin loops on HiChIP contact maps.

Xiaotao

maharshi14 commented 2 years ago

Hi again,

The issue indeed was the chromosome annotations as 'chr1', 'chr2',...so on. Reprocessing my HiC-Pro output was indeed the trick. Thank you!