Closed Inforeon closed 3 weeks ago
Hey @Inforeon . Thank you for the PR and looking into this issue! The issue happens because the mapping from 450k cpgs to cpgs in hg38 is not perfect so the file has some missing rows. So your logic is correct, but I think the code should be moved. So if it's ok by you move the code as I wrote in the comments and then I will accept it! Thank you
Sure no problem. I've moved the relevant code to the load_full_ref function, at the end, under an if-statement checking for hg38, and tested it again. I've also added/changed a couple comments to reflect the code change.
Also, I'm checking genome.genome, instead of args.genome, as args.genome can be 'default' while still using hg38
Hi, I was using wgbstools for some analysis and had some problems using the beta_to_450k functionality, which I've debugged on my end. I'm making a pull request in case any one else would find this useful
environment: running python 3.11.8 in a conda environment on commit
e7bf6c6fced4f12f615b12adc97a37d495d37799
, with reference genome hg38The issue Trying to run
wgbstools beta_to_450k --genome hg38 <FILE>.beta
produced the following error:error running beta_to_450k
```text multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "LOCATION/wgbstools/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "LOCATION/wgbstools/lib/python3.11/multiprocessing/pool.py", line 51, in starmapstar return list(itertools.starmap(args[0], args[1])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "LOCATION/wgbs_tools/src/python/beta_to_450k.py", line 18, in single_beta values = beta2vec(load_beta_data(beta_path)[indices - 1], min_cov=cov_thresh) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^ IndexError: arrays used as indices must be of integer (or boolean) type """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "BIN_PATH/bin/wgbstools", line 97, inThis is fixed by casting the indices to type int
The error then changed to
Second error
```text LOCATION/wgbs_tools/src/python/beta_to_450k.py:95: RuntimeWarning: invalid value encountered in cast indices = np.array(df['cpg']).astype(int) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "LOCATION/wgbstools/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "LOCATION/wgbstools/lib/python3.11/multiprocessing/pool.py", line 51, in starmapstar return list(itertools.starmap(args[0], args[1])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "LOCATION/wgbs_tools/src/python/beta_to_450k.py", line 18, in single_beta values = beta2vec(load_beta_data(beta_path)[indices - 1], min_cov=cov_thresh) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^ IndexError: index 9223372036854775807 is out of bounds for axis 0 with size 29401795 """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "BIN_PATH/bin/wgbstools", line 97, inNote, that 9223372036854775807 was the sys.maxsize on my system, so presumably this error is the result of casting null or inf values to int. Without the int typecast, however, the first error reoccurs
With these changes though, beta_to_450k functioned as expected, and the results of downstream analysis also came out as expected