output dtype error in bpnet negatives

adamyhe commented 3 months ago

Hi Jacob,

I'm getting this rather weird error when using bpnet negatives to extract negative controls for training the bias ChromBPNet models:

Traceback (most recent call last):
  File "/home/ayh8/miniconda3/envs/bpnet/bin/bpnet", line 347, in <module>
    matched_loci = extract_matching_loci(
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ayh8/miniconda3/envs/bpnet/lib/python3.12/site-packages/tangermeme/match.py", line 375, in extract_matching_loci
    matched_gc = X_matched.mean(axis=-1)[:, [1, 2]].sum(axis=1).numpy()
                 ^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: mean(): could not infer output dtype. Input dtype must be either a floating point or complex dtype. Got: Char

I'm still on numpy 1.26.4, so I don't think it's the new 2.0.0 update causing issues. I've cleaned the input bed files for only autosomes (attached a minimal example of the first 10 windows), and I'm using the cleaned male.hg19.fasta reference that ENCODE used (https://www.encodeproject.org/files/male.hg19/).

I've also tried running this using the tangermeme API, and I'm getting a similar error. I'm on bpnetlite version 0.8.1 and tangermeme 0.2.1. Any insights into this bug or possible workarounds would be greatly appreciated!

Adam

jmschrei commented 3 months ago

char usually means that you're using an 8-bit encoding, e.g., int8. This is the default return from tangermeme's one-hot encode function so it's possible that switching over caused the issue somehow. I will try to look into this over the weekend.

adamyhe commented 3 months ago

Specifying dtype=torch.float fixes the crashing issue for me (and hopefully doesn't introduce additional problems).

https://github.com/jmschrei/tangermeme/pull/9

jmschrei / bpnet-lite

output dtype error in bpnet negatives #8