Closed tblazey closed 1 year ago
@tblazey Thanks for the PR. Solution to #105 seems good to me.
@sravan953 I had a quick look into find_or_insert()
function. It looks like it is creating a string from the shape array to use it as a dictionary key. Is my understanding correct?
If that is the case, it is very slow and error-prone. Adding threshold=np.inf
will just mask the issue.
A better hack would be just replacing np.array2string()
call with new_data.tobytes()
. This will be 1) much faster as it doesn't do any formatting, 2) precise, as it will make sure two shapes are identical byte by byte.
An even more proper solution would be calculating a hash from the array, and using it as the key.
Further info: https://stackoverflow.com/questions/39674863/python-alternative-for-using-numpy-array-as-key-in-dictionary
PS: I tried this, it is really fast and generates the expected output as mentioned in #106.
Suggested change is:
Replace these 3 lines:
data_string = np.array2string(
new_data, threshold=np.inf, formatter={"float_kind": lambda x: "%.6g" % x}
)
data_string = data_string.replace("[", "")
data_string = data_string.replace("]", "")
with
data_string = new_data.tobytes()
If it is not clear, I can create a new PR.
@bilal-tasdelen I was not aware of performance issues, thank you for bringing it to my notice.
@tblazey Thank you for the PR! Always super happy to see the community participating.
I request you to submit 2 separate PRs for each of the bugs if you don't mind? For the PR fixing the bug in event_lib.py
, could you please implement bilal's suggested fix, referring to the code example he posted in the previous comment?
Hi,
I wasn't sure how to split the original request, so I submitted separate pull requests (#110 and #111). Hopefully that is ok.
Thanks,
-Tyler
Hi,
These are suggested fixes for the following issues #105 and #106. This is my first pull request, so apologies if I am making any mistakes.
Thanks!
-Tyler