Open kwchurch opened 2 years ago
Hi @kwchurch, thank you for the detailed dev log! I slightly edited the format to further improve the readability. At a first glance, it looks to me like an issue of incompatible dtype. More specifically, the csr
used by PecanPy uses uint32
for both the index
and indptr
fields, rather than int32
as used by scipy.sparse.csr
. Similarly, PecanPy uses float32
instead of float64
for the data
field in the csr
object.
I think to resolve the type issue, the most straightforward solution is to enforce the desired types (i.e., float32
for data
; uint32
for indices
and `indptr) at loading time: https://github.com/krishnanlab/PecanPy/blob/49d60630b4589eeab992eef2da9c2eaf6b19fab8/src/pecanpy/graph.py#L432-L438
I will first try to reproduce the error here using the example script you provided, and then see if my proposed solution actually fixes the issue.
As we also discussed, I will add the option for implicitly assigning node IDs if it is not found in the .csr.npz
file. I will make it so that it requires a "soft confirmation" from the user that the implicit assignment is desired by printing a warning message about the implicit assignment, unless a specific flag (e.g., --implicit_node_ids
) is set.
Hi @kwchurch, I've created a new branch (see #124) implementing my suggestions above (explicit dtype setting and implicit node IDs setting). The scipy csr karate test case works fine on my end.
In the meantime, if you would like to give the new changes a try and let me know if this resolves your issue, that would be great. You can run it as before using
pecanpy --input demo/karate.bool.npz --output demo/karate.int.emb --mode SparseOTF
which will warn you about the implicit node IDs setting. To suppress that, you can set the --implicit_ids
flag:
pecanpy --input demo/karate.bool.npz --output demo/karate.int.emb --mode SparseOTF --implicit_ids
ok
do you think it could check the datatypes and make the necessary conversions automatically?
On Wed, Jun 29, 2022 at 4:04 AM Remy Liu @.***> wrote:
Hi @kwchurch https://github.com/kwchurch, thank you for the detailed dev log! I slightly edited the format to further improve the readability. At a first glance, it looks to me like an issue of incompatible dtype. More specifically, the csr used by PecanPy uses uint32 for both the index and indptr fields, rather than int32 as used by scipy.sparse.csr. Similarly, PecanPy uses float32 instead of float64 for the data field in the csr object.
I think to resolve the type issue, the most straightforward solution is to enforce the desired types (i.e., float32 for data; uint32 for indices and `indptr) at loading time: https://github.com/krishnanlab/PecanPy/blob/49d60630b4589eeab992eef2da9c2eaf6b19fab8/src/pecanpy/graph.py#L432-L438
I will first try to reproduce the error here using the example script you provided, and then see if my proposed solution actually fixes the issue.
As we also discussed, I will add the option for implicitly assigning node IDs if it is not found in the .csr.npz file. I will make it so that it requires a "soft confirmation" from the user that the implicit assignment is desired by printing a warning message about the implicit assignment, unless a specific flag (e.g., --implicit_node_ids) is set.
— Reply to this email directly, view it on GitHub https://github.com/krishnanlab/PecanPy/issues/122#issuecomment-1169843912, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKUDKLY6PB4MGDDAPQ45GTVRQUSTANCNFSM52C2UW3Q . You are receiving this because you were mentioned.Message ID: @.***>
@kwchurch yes it is doing that now https://github.com/krishnanlab/PecanPy/blob/a12f27c608bb5b72651481b80380bffdf42053ab/src/pecanpy/graph.py#L443-L445
great
On Wed, Jun 29, 2022 at 7:48 AM Remy Liu @.***> wrote:
@kwchurch https://github.com/kwchurch yes it is doing that now
— Reply to this email directly, view it on GitHub https://github.com/krishnanlab/PecanPy/issues/122#issuecomment-1170078473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKUDKJJCHYTZUN422CETSLVRRO4BANCNFSM52C2UW3Q . You are receiving this because you were mentioned.Message ID: @.***>
let me know when you have something ready to try out
On Wed, Jun 29, 2022 at 7:48 AM Remy Liu @.***> wrote:
@kwchurch https://github.com/kwchurch yes it is doing that now
— Reply to this email directly, view it on GitHub https://github.com/krishnanlab/PecanPy/issues/122#issuecomment-1170078473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKUDKJJCHYTZUN422CETSLVRRO4BANCNFSM52C2UW3Q . You are receiving this because you were mentioned.Message ID: @.***>
@kwchurch it is ready to be tried out, but it is not on the main
branch. you'll need to checkout the scipy-csr
branch, and you will find the new changes there.
Hi @kwchurch, I have completed some more testing and merged the new feature (implicit IDs) back to the main branch (see 2d58132807089e8f5fbd5095be342149a039bf18). Let me know if you get a chance to test and see if this works in your case.
I have some graphs with nodes that have no edges
Is that a problem?
init pecanpy: p = 1, q = 1, workers = 16, verbose = True, extend = True, gamma = 0, random_state = None
/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/rw/sparse_rw.py:30: RuntimeWarning: Mean of empty slice.
data[indptr[i] : indptr[i + 1]].mean()
/home/k.church/venv/gft/lib/python3.8/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
/home/k.church/venv/gft/lib/python3.8/site-packages/numpy/core/_methods.py:262: RuntimeWarning: Degrees of freedom <= 0 for slice
ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/home/k.church/venv/gft/lib/python3.8/site-packages/numpy/core/_methods.py:222: RuntimeWarning: invalid value encountered in true_divide
arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/home/k.church/venv/gft/lib/python3.8/site-packages/numpy/core/_methods.py:254: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
File "/var/spool/slurm/d/job27656002/slurm_script", line 8, in
sys.exit(main())
File "/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/cli.py", line 333, in main
walks = simulate_walks(args, g)
File "/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/wrappers.py", line 18, in wrapper
result = func(*args, **kwargs)
File "/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/cli.py", line 320, in simulate_walks
return g.simulate_walks(args.num_walks, args.walk_length)
File "/home/k.church/venv/gft/lib/python3.8/site-packages/pecanpy/pecanpy.py", line 153, in simulate_walks
walk_idx_mat = self._random_walks(
File "/home/k.church/venv/gft/lib/python3.8/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
error_rewrite(e, 'typing')
File "/home/k.church/venv/gft/lib/python3.8/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
^[[1m^[[1m^[[1m^[[1mFailed in nopython mode pipeline (step: nopython frontend)
^[[1m^[[1m^[[1m^[[1mFailed in nopython mode pipeline (step: nopython frontend)
^[[1m^[[1mNo implementation of function Function(
imul(array(bool, 1d, C), array(float64, 1d, C))
There are 8 candidate implementations:
^[[1m - Of which 4 did not match due to:
Overload of function 'imul': File:
With argument(s): '(array(bool, 1d, C), array(float64, 1d, C))':^[[0m
^[[1m No match.^[[0m
^[[1m - Of which 2 did not match due to:
Overload in function 'NumpyRulesInplaceArrayOperator.generic': File: numba/core/typing/npydecl.py: Line 244.
With argument(s): '(array(bool, 1d, C), array(float64, 1d, C))':^[[0m
^[[1m Rejected as the implementation raised a specific error:
AttributeError: 'NoneType' object has no attribute 'args'^[[0m
raised from /home/k.church/venv/gft/lib/python3.8/site-packages/numba/core/typing/npydecl.py:255
^[[1m - Of which 2 did not match due to:
Operator Overload in function 'imul': File: unknown: Line unknown.
With argument(s): '(array(bool, 1d, C), array(float64, 1d, C))':^[[0m
On Wed, Jun 29, 2022 at 8:37 AM Remy Liu @.***> wrote:
@kwchurch https://github.com/kwchurch it is ready to be tried out, but it is not on the main branch. you'll need to checkout the scipy-csr branch, and you will find the new changes there.
— Reply to this email directly, view it on GitHub https://github.com/krishnanlab/PecanPy/issues/122#issuecomment-1170135979, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKUDKNOBVOUVKHZ674HFRDVRRUSXANCNFSM52C2UW3Q . You are receiving this because you were mentioned.Message ID: @.***>
I have a large csr_matrix in npz format. I'd like to use that as input as is, but it doens't have IDs field
added this to graph.py (but it doesn't work)
Created edg2npz.py with this:
called it with
Unfortunately, I can't use this kind of csr_matrix...
I can write out my matrix to text and then run pecanpy on that, but my matrix is very large and it will take a long time to write it out and read it back. My matrix has N = 300M nodes and E=2B nonzero edges.
There are 6 candidate implementations:
Overload in function 'NumpyRulesInplaceArrayOperator.generic': File: numba/core/typing/npydecl.py: Line 244.
With argument(s): '(array(bool, 1d, C), int64)': Rejected as the implementation raised a specific error:
AttributeError: 'NoneType' object has no attribute 'args' raised from /home/k.church/venv/gft/lib/python3.8/site-packages/numba/core/typing/npydecl.py:255
Operator Overload in function 'itruediv': File: unknown: Line unknown.
With argument(s): '(array(bool, 1d, C), int64)': No match for registered cases:
Overload of function 'itruediv': File: numba/core/typing/npdatetime.py: Line 94.
With argument(s): '(array(bool, 1d, C), int64)': No match.