mosaicml / streaming

A Data Streaming Library for Efficient Neural Network Training
https://streaming.docs.mosaicml.com
Apache License 2.0
1.09k stars 136 forks source link

Fix linting issues with numpy 2 #705

Closed snarayan21 closed 3 months ago

snarayan21 commented 3 months ago

Description of changes:

Seeing the error below on CI/CD linting:

home/runner/work/streaming/streaming/streaming/base/spanner.py:59:35 - error: Argument of type "NDArray[signedinteger[Any]]" cannot be assigned to parameter "__x" of type "ReadableBuffer | str | SupportsInt | SupportsIndex | SupportsTrunc" in function "__new__"
    Type "NDArray[signedinteger[Any]]" cannot be assigned to type "ReadableBuffer | str | SupportsInt | SupportsIndex | SupportsTrunc"
      "NDArray[signedinteger[Any]]" is incompatible with "str"
      "NDArray[signedinteger[Any]]" is incompatible with "ReadOnlyBuffer"
      "NDArray[signedinteger[Any]]" is incompatible with "bytearray"
      "NDArray[signedinteger[Any]]" is incompatible with "memoryview"
      "NDArray[signedinteger[Any]]" is incompatible with "array[Any]"
      "NDArray[signedinteger[Any]]" is incompatible with "mmap"
      "NDArray[signedinteger[Any]]" is incompatible with "_CData" (reportGeneralTypeIssues)

This seems to be an issue with the numpy upgrade and typing for casting to ints. We cast to int explicitly using .item() to address the typing issue.

The second error is:

/home/runner/work/streaming/streaming/streaming/base/shared/prefix.py:121:42 - error: "in1d" is not a known member of module (reportGeneralTypeIssues)

Which is happening because np.in1d is deprecated in numpy 2, in favor of np.isin (see here). We simply replace the call with np.isin.

Issue #, if available:

Merge Checklist:

Put an x without space in the boxes that apply. If you are unsure about any checklist, please don't hesitate to ask. We are here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

Tests