vaquerizaslab / chess

Comparison of Hi-C Experiments using Structural Similarity.
Other
26 stars 6 forks source link

chess extract fails with `ValueError: Image must contain only positive values` #6

Closed cgirardot closed 4 years ago

cgirardot commented 4 years ago

Dear authors,

I am trying out chess on my drosophila data. I can reproduce the WF up to the extract feature step; which fails after 90 mns of run with the error ValueError: Image must contain only positive values (full sterr with error stack below). The matrices I passed are the same as in the first step and are in cool format; generated with HicExplorer suite (with ICE correction).

Also, I checked and my cool matrices have no negative values but contains a few hundreds of NaN (masked bins I assume)

Could you please help ?

2020-10-21 18:47:53,749 INFO Running '/g/funcgen/gbcs/public/software/conda/envs/chess-hic-0.3.2/bin/chess extract /g/furlong/project/69_Gaby_Hi-C_Mutants/analysis/HiC-chess/search_250kb_win_50kb_step/features/chr2R_WT_23h_vs_KO_23h_chess_results_filtered.bed /g/furlong/project/69_Gaby_Hi-C_Mutants/data/HiC/HiC_Bridge_Merged/hicmatrices/binned/10K/corrected/WT_23h_raw_10K_mrged.corrected.cool /g/furlong/project/69_Gaby_Hi-C_Mutants/data/HiC/HiC_Bridge_Merged/hicmatrices/binned/10K/corrected/KO_23h_raw_10K_mrged.corrected.cool /g/furlong/project/69_Gaby_Hi-C_Mutants/analysis/HiC-chess/search_250kb_win_50kb_step/features'
2020-10-21 18:47:53,750 INFO Loading reference contact data
2020-10-21 18:48:13,070 INFO Note: NumExpr detected 48 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2020-10-21 20:18:50,367 INFO Loading region pairs
2020-10-21 20:18:50,385 INFO Applying image filtering to identify specific structures
Traceback (most recent call last):
  File "/g/funcgen/gbcs/public/software/conda/envs/chess-hic-0.3.2/bin/chess", line 548, in <module>
    Chess()
  File "/g/funcgen/gbcs/public/software/conda/envs/chess-hic-0.3.2/bin/chess", line 76, in __init__
    getattr(self, args.command)([sys.argv[0]] + sys.argv[option_ix:])
  File "/g/funcgen/gbcs/public/software/conda/envs/chess-hic-0.3.2/bin/chess", line 521, in extract
    extract_structures(
  File "/g/funcgen/gbcs/public/software/conda/envs/chess-hic-0.3.2/lib/python3.8/site-packages/chess/get_structures.py", line 123, in extract_structures
    denoise_positive = restoration.denoise_bilateral(
  File "/g/funcgen/gbcs/public/software/conda/envs/chess-hic-0.3.2/lib/python3.8/site-packages/skimage/restoration/_denoise.py", line 205, in denoise_bilateral
    raise ValueError("Image must contain only positive values")
ValueError: Image must contain only positive values
nickmachnik commented 4 years ago

Hi, thank you for reporting this. Unfortunately I don't now where this error is coming from. We will investigate.

nickmachnik commented 4 years ago

Hi, we have a potential fix; unfortunately I cannot reproduce the error with our test data, but we have a good idea what the problem might be. I pushed a patch to the repo. Did you install from source? If yes, could you pull and see whether the error still occurs?

Another sidenote: To speed up things you could transform your data to Juicer of FAN-C format (fanc from-cooler path/to/your/data.file path/to/output/data.file).

cgirardot commented 4 years ago

Hi, I used pip so far in a conda env initialized with python 3.8.2. I just tried the src install but it fails. Also it seems fanc needs python < 3.8.0a. Could you tell me what python version you are using pls?

kaukrise commented 4 years ago

When used with bioconda, FAN-C currently only works in Python version 3.7, due to a dependency problem in other versions. Please see this this thread for details: https://github.com/bioconda/bioconda-recipes/pull/23911

When installed via pip in a regular Python 3.8 environment outside conda this works fine.

nickmachnik commented 4 years ago

Have you tried the install from src in a clean virtualenv? What error message do you get?

I can release the patch on PyPI, just wanted to make sure it really fixes the issue.

cgirardot commented 4 years ago

trying again

>conda create ... python=3.7
> conda activate ...
> git clone https://github.com/vaquerizaslab/chess
> pip install chess
ERROR: Could not find a version that satisfies the requirement chess (from versions: none)
ERROR: No matching distribution found for chess

so I try the other wat as on your doc

> cd chess
> python setup.py install

and it errors with

...skipping a lot ...
Searching for PyYAML>=5.1
Reading https://pypi.org/simple/PyYAML/
Downloading https://files.pythonhosted.org/packages/64/c2/b80047c7ac2478f9501676c988a5411ed5572f35d1beff9cae07d321512c/PyYAML-5.3.1.tar.gz#sha256=b8eac752c5e14d3eca0e6dd9199cd627518cb5ec06add0de9d32baeee6fe645d
Best match: PyYAML 5.3.1
Processing PyYAML-5.3.1.tar.gz
Writing /tmp/easy_install-23p0bz2i/PyYAML-5.3.1/setup.cfg
Running PyYAML-5.3.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-23p0bz2i/PyYAML-5.3.1/egg-dist-tmp-n1_1_vtd
In file included from ext/_yaml.c:596:0:
ext/_yaml.h:2:18: fatal error: yaml.h: No such file or directory
 #include <yaml.h>
                  ^
compilation terminated.
Error compiling module, falling back to pure Python
zip_safe flag not set; analyzing archive contents...
Moving PyYAML-5.3.1-py3.7-linux-x86_64.egg to /g/funcgen/gbcs/public/software/conda/envs/chess-hic-src/lib/python3.7/site-packages
Adding PyYAML 5.3.1 to easy-install.pth file

Installed /g/funcgen/gbcs/public/software/conda/envs/chess-hic-src/lib/python3.7/site-packages/PyYAML-5.3.1-py3.7-linux-x86_64.egg
Searching for pyBigWig
Reading https://pypi.org/simple/pyBigWig/
Downloading https://files.pythonhosted.org/packages/b0/e2/cf945d541a10bb9c675f986d5bf0b0268544721054d17cc6260cfcfb3685/pyBigWig-0.3.17.tar.gz#sha256=41f64f802689ed72e15296a21a4b7abd3904780b2e4f8146fd29098fc836fd94
Best match: pyBigWig 0.3.17
Processing pyBigWig-0.3.17.tar.gz
Writing /tmp/easy_install-3wbyya93/pyBigWig-0.3.17/setup.cfg
Running pyBigWig-0.3.17/setup.py -q bdist_egg --dist-dir /tmp/easy_install-3wbyya93/pyBigWig-0.3.17/egg-dist-tmp-30n8kjtx
/g/funcgen/gbcs/public/software/conda/envs/chess-hic-src/lib/python3.7/distutils/dist.py:274: UserWarning: Unknown distribution option: 'classifier'
  warnings.warn(msg)
pyBigWig.c: In function ‘PyString_AsString’:
pyBigWig.c:746:5: warning: return discards ‘const’ qualifier from pointer target type [enabled by default]
     return PyUnicode_AsUTF8(obj);
     ^
zip_safe flag not set; analyzing archive contents...
__pycache__.pyBigWig.cpython-37: module references __file__
pyBigWigTest.__pycache__.test.cpython-37: module references __file__
No eggs found in /tmp/easy_install-3wbyya93/pyBigWig-0.3.17/egg-dist-tmp-30n8kjtx (setup script problem?)
error: The 'pyBigWig' distribution was not found and is required by genomic-regions, fanc
cgirardot commented 4 years ago

it would indeed help if you make a pip release (this installs easily)

nickmachnik commented 4 years ago

I can reproduce your installation error with python setup.py install. Could you please try this:

git clone https://github.com/vaquerizaslab/chess
cd chess
pip install .

this works for me in a clean python 3.7.0 virtualenv.

cgirardot commented 4 years ago

this worked, I'll try the chess extract now but I am waiting for the fanc from-cooler to finish ; actually is it expected to be this slow ? 36% (16790367 of 46639907) |###################################### | Elapsed Time: 0:35:55 ETA: 1:03:41

kaukrise commented 4 years ago

The initial conversion is (relatively) slow, but the CHESS run will be a lot faster then.

cgirardot commented 4 years ago

ok thx, will let you know if feature extraction worked asap

cgirardot commented 4 years ago

I am actually having an issue with running fanc-0.9.5 from-cooler mat.cool mat.fanc (now trying on 50K bin matrice to speed up the tests):

2020-10-22 17:37:13,593 INFO FAN-C version: 0.9.5
100% (3486687 of 3486687) |#############################################################################################################| Elapsed Time: 0:07:28 Time:  0:07:28
Buffers 100% (6 of 6) |#################################################################################################################| Elapsed Time: 0:00:00 Time:  0:00:00
Expected 100% (3486687 of 3486687) |####################################################################################################| Elapsed Time: 0:00:37 Time:  0:00:37
Expected 100% (3486687 of 3486687) |####################################################################################################| Elapsed Time: 0:00:39 Time:  0:00:39
2020-10-22 17:48:08,426 INFO All done.
Closing remaining open files:T70NM4...done

but then I have no mat.fanc !

Am I missing something obvious here ?

kaukrise commented 4 years ago

Whoops, my bad, sorry about that! I have fixed the file output and have uploaded a new version to Pypi (0.9.6). It might take a few minutes to become available, but it fixes the missing output file.

cgirardot commented 4 years ago

Good morning everyone !

The fix seems to work 👍

2020-10-22 17:15:56,526 INFO CHESS version: 0.3.4
2020-10-22 17:15:56,526 INFO FAN-C version: 0.9.5
2020-10-22 17:15:56,530 INFO Loading reference contact data
2020-10-22 20:01:39,343 INFO Loading region pairs
2020-10-22 20:01:39,348 INFO Applying image filtering to identify specific structures
2020-10-22 20:01:40,095 INFO Results collected
2020-10-22 20:01:48,740 INFO Finished

I'll be happy to install the next chess version. Could you also please make sure to bump up the dependency to last fanc 0.9.6 ?

cgirardot commented 4 years ago

@kaukrise, Last question about the fanc from-cooler :

kaukrise commented 4 years ago

Good morning!

  • what extensions do you use for these fanc files ? .fanc ?

You can use whatever you like, the file type recognition is not based on the extension. We usually use .hic, but if you feel that is too confusing with Juicer matrices (we started development on FAN-C before Juicer was published), just stick with .fanc!

  • can I also use these .fanc files for the chess sim step ? Would this also speed up things ?

One of the main speed-limiting steps in sim is the calculation of observed/expected (O/E) matrices. For txt and Cooler files, these have to be calculated from scratch each time you run sim. Last time I checked there was no provision in the .cool file format to store O/E values. FAN-C and Juicer Hi-C files have precomputed O/E matrices, so you should see a big speed-up when using one of those formats.

cgirardot commented 4 years ago

@nickmachnik have you already pushed the new chess version ?

nickmachnik commented 4 years ago

0.3.4 has the fix, it is on source. I just pushed the release to PyPi, too.

cgirardot commented 4 years ago

thank you