microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.61k stars 3.83k forks source link

Open PR to remove `lightgbm` stubs from `microsoft/python-type-stubs` at next release #5863

Closed Avasam closed 1 year ago

Avasam commented 1 year ago

Description

Since both projects are under the Microsoft organization, it made sense to open a reminder here.

Open a PR to delete https://github.com/microsoft/python-type-stubs/tree/main/lightgbm once a typed version of LightGBM is released on PyPI. This could be added as a checklist item to #5153 if that's the version that will include type hints and a py.typed marker.

Motivation

https://github.com/microsoft/python-type-stubs is bundled with Pylance. The long-term goal as stated by maintainers is to upstream everything either to the base repository, or typeshed. Once LightGBM for Python releases with type hints, https://github.com/microsoft/python-type-stubs/tree/main/lightgbm needs to be deleted or users of Pylance won't be able to properly make use of the new type-hints. As well as causing differences between what's seen in the IDE and the pyright CLI results.

jameslamb commented 1 year ago

😱 I've never seen this! Thanks very much for bringing it to our attention.

We've been methodically working on adding type hints here in this repo for 2+ years (#3756). 😭

@shiyu1994 did you know about that project? Is it a part of VS Code?

@bschnurr since you're the author of https://github.com/microsoft/python-type-stubs/pull/257, could you help us understand the purpose of that PR and what you'd like to see happen here in LightGBMM?

bschnurr commented 1 year ago

I see now there are return type hints. https://github.com/microsoft/LightGBM/blob/d0dfceec377d34cddd6722f870d073f7aa64ca2d/python-package/lightgbm/basic.py#L2755

I added stubs to address an issue with slow return type inference for function self.get_data in module lightgbm.basic logs:

(39760) [BG(1)]                                           Re ["concat" (lightgbm.basic) [2470:33]] (3ms) [f:0, t:1, p:0, i:0, b:0]
(39760) [BG(1)]                                         Re ["self.data.getformat" (lightgbm.basic) [2455:33]] (568ms) [f:0, t:1, p:0, i:0, b:0]
(39760) [BG(1)]                                       Re ["self.data.iloc[self.used_indic <shortened> " (lightgbm.basic) [2323:33]] (10856ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10856ms)
(39760) [BG(1)]                                     Re ["self.get_data" (lightgbm.basic) [1805:25]] (10882ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10882ms)
(39760) [BG(1)]                                   Re ["self.set_group" (lightgbm.basic) [1807:25]] (10882ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10882ms)
(39760) [BG(1)]                                 Re ["self.get_label" (lightgbm.basic) [1808:24]] (10884ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10884ms)
(39760) [BG(1)]                               Re ["self.reference._predictor" (lightgbm.basic) [1810:96]] (10884ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10884ms)
(39760) [BG(1)]                             Re ["self.get_data" (lightgbm.basic) [1811:25]] (10884ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10884ms)
(39760) [BG(1)]                           Re ["self._set_init_score_by_predic <shortened> " (lightgbm.basic) [1812:25]] (10884ms) [f:0, t:1, p:0, i:0, b:0]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (10884ms)
(39760) [BG(1)]                         Re ["train_set.construct" (lightgbm.basic) [2605:13]] (11435ms) [f:1, t:1, p:2, i:3, b:1]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (11435ms)
(39760) [BG(1)]                       Re ["params.update" (lightgbm.basic) [2607:13]] (11440ms) [f:1, t:1, p:2, i:3, b:1]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (11440ms)
(39760) [BG(1)]                     Re ["predictor.predict" (lightgbm.basic) [3538:16]] (11630ms) [f:2, t:2, p:2, i:3, b:3]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (11630ms)
(39760) [BG(1)]                   Re ["self._Booster.predict" (lightgbm.sklearn) [803:16]] (11632ms) [f:2, t:2, p:2, i:3, b:3]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (11632ms)
(39760) [BG(1)]                 Re ["super().predict" (lightgbm.sklearn) [997:18]] (11633ms) [f:2, t:2, p:2, i:3, b:3]
[Info  - 11:32:00 AM] (39760) [BG(1)] Long operation: Re (11633ms)
(39760) [BG(1)]               Re ["lgb_model.predict_proba" (detector) [349:37]] (11998ms) [f:5, t:13, p:25, i:5, b:36]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: Re (11998ms)
(39760) [BG(1)]             Re ["gc.collect" (detector) [353:5]] (11999ms) [f:5, t:13, p:25, i:5, b:36]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: Re (11999ms)
(39760) [BG(1)]           Re ["load_npz" (detector) [355:12]] (11999ms) [f:5, t:13, p:25, i:5, b:36]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: Re (11999ms)
(39760) [BG(1)]         Re ["csr_matrix" (detector) [356:12]] (11999ms) [f:5, t:13, p:25, i:5, b:36]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: Re (11999ms)
(39760) [BG(1)]       Re ["lgb_model.predict_proba" (detector) [357:24]] (12088ms) [f:6, t:17, p:31, i:57, b:46]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: Re (12088ms)
(39760) [BG(1)]     Re ["gc.collect" (detector) [362:5]] (12088ms) [f:6, t:17, p:31, i:57, b:46]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: Re (12088ms)
(39760) [BG(1)]   getDeclarationsForNameNode ["format" (detector) [314:23]] (12090ms) [f:6, t:17, p:31, i:57, b:46]
[Info  - 11:32:01 AM] (39760) [BG(1)] Long operation: getDeclarationsForNameNode (12090ms)
(39760) [BG(1)]   getDeclarationsForNameNode ...

Source code example

import pandas as pd
import numpy as np
import lightgbm as lgb
#import xgboost as xgb
from scipy.sparse import vstack, csr_matrix, save_npz, load_npz
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
import gc

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
import seaborn as sns
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import Ridge
from sklearn.compose import TransformedTargetRegressor
from sklearn.metrics import median_absolute_error
import matplotlib.pyplot as plt
from sklearn.model_selection import cross_validate
from sklearn.model_selection import RepeatedKFold

gc.enable()

dtypes = {
        'MachineIdentifier':                                    'category',
        'ProductName':                                          'category',
        'EngineVersion':                                        'category',
        'AppVersion':                                           'category',
        'AvSigVersion':                                         'category',
        'IsBeta':                                               'int8',
        'RtpStateBitfield':                                     'float16',
        'IsSxsPassiveMode':                                     'int8',
        'DefaultBrowsersIdentifier':                            'float16',
        'AVProductStatesIdentifier':                            'float32',
        'AVProductsInstalled':                                  'float16',
        'AVProductsEnabled':                                    'float16',
        'HasTpm':                                               'int8',
        'CountryIdentifier':                                    'int16',
        'CityIdentifier':                                       'float32',
        'OrganizationIdentifier':                               'float16',
        'GeoNameIdentifier':                                    'float16',
        'LocaleEnglishNameIdentifier':                          'int8',
        'Platform':                                             'category',
        'Processor':                                            'category',
        'OsVer':                                                'category',
        'OsBuild':                                              'int16',
        'OsSuite':                                              'int16',
        'OsPlatformSubRelease':                                 'category',
        'OsBuildLab':                                           'category',
        'SkuEdition':                                           'category',
        'IsProtected':                                          'float16',
        'AutoSampleOptIn':                                      'int8',
        'PuaMode':                                              'category',
        'SMode':                                                'float16',
        'IeVerIdentifier':                                      'float16',
        'SmartScreen':                                          'category',
        'Firewall':                                             'float16',
        'UacLuaenable':                                         'float32',
        'Census_MDC2FormFactor':                                'category',
        'Census_DeviceFamily':                                  'category',
        'Census_OEMNameIdentifier':                             'float16',
        'Census_OEMModelIdentifier':                            'float32',
        'Census_ProcessorCoreCount':                            'float16',
        'Census_ProcessorManufacturerIdentifier':               'float16',
        'Census_ProcessorModelIdentifier':                      'float16',
        'Census_ProcessorClass':                                'category',
        'Census_PrimaryDiskTotalCapacity':                      'float32',
        'Census_PrimaryDiskTypeName':                           'category',
        'Census_SystemVolumeTotalCapacity':                     'float32',
        'Census_HasOpticalDiskDrive':                           'int8',
        'Census_TotalPhysicalRAM':                              'float32',
        'Census_ChassisTypeName':                               'category',
        'Census_InternalPrimaryDiagonalDisplaySizeInInches':    'float16',
        'Census_InternalPrimaryDisplayResolutionHorizontal':    'float16',
        'Census_InternalPrimaryDisplayResolutionVertical':      'float16',
        'Census_PowerPlatformRoleName':                         'category',
        'Census_InternalBatteryType':                           'category',
        'Census_InternalBatteryNumberOfCharges':                'float32',
        'Census_OSVersion':                                     'category',
        'Census_OSArchitecture':                                'category',
        'Census_OSBranch':                                      'category',
        'Census_OSBuildNumber':                                 'int16',
        'Census_OSBuildRevision':                               'int32',
        'Census_OSEdition':                                     'category',
        'Census_OSSkuName':                                     'category',
        'Census_OSInstallTypeName':                             'category',
        'Census_OSInstallLanguageIdentifier':                   'float16',
        'Census_OSUILocaleIdentifier':                          'int16',
        'Census_OSWUAutoUpdateOptionsName':                     'category',
        'Census_IsPortableOperatingSystem':                     'int8',
        'Census_GenuineStateName':                              'category',
        'Census_ActivationChannel':                             'category',
        'Census_IsFlightingInternal':                           'float16',
        'Census_IsFlightsDisabled':                             'float16',
        'Census_FlightRing':                                    'category',
        'Census_ThresholdOptIn':                                'float16',
        'Census_FirmwareManufacturerIdentifier':                'float16',
        'Census_FirmwareVersionIdentifier':                     'float32',
        'Census_IsSecureBootEnabled':                           'int8',
        'Census_IsWIMBootEnabled':                              'float16',
        'Census_IsVirtualDevice':                               'float16',
        'Census_IsTouchEnabled':                                'int8',
        'Census_IsPenCapable':                                  'int8',
        'Census_IsAlwaysOnAlwaysConnectedCapable':              'float16',
        'Wdft_IsGamer':                                         'float16',
        'Wdft_RegionIdentifier':                                'float16',
        'HasDetections':                                        'int8'
        }

# scikit-learn examples
survey = fetch_openml(data_id=534, as_frame=True)
X = survey.data[survey.feature_names]
X.describe(include="all")
X.head()
y = survey.target.values.ravel()
survey.target.head()
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=42
)
train_dataset = X_train.copy()
train_dataset.insert(0, "WAGE", y_train)
_ = sns.pairplot(train_dataset, kind='reg', diag_kind='kde')
survey.data.info()

categorical_columns = ['RACE', 'OCCUPATION', 'SECTOR',
                       'MARR', 'UNION', 'SEX', 'SOUTH']
numerical_columns = ['EDUCATION', 'EXPERIENCE', 'AGE']

preprocessor = make_column_transformer(
    (OneHotEncoder(drop='if_binary'), categorical_columns),
    remainder='passthrough'
)

model = make_pipeline(
    preprocessor,
    TransformedTargetRegressor(
        regressor=Ridge(alpha=1e-10),
        func=np.log10,
        inverse_func=sp.special.exp10
    )
)
_ = model.fit(X_train, y_train)

mae = median_absolute_error(y_train, y_pred)
string_score = f'MAE on training set: {mae:.2f} $/hour'
y_pred = model.predict(X_test)
mae = median_absolute_error(y_test, y_pred)
string_score += f'\nMAE on testing set: {mae:.2f} $/hour'
fig, ax = plt.subplots(figsize=(5, 5))
plt.scatter(y_test, y_pred)
ax.plot([0, 1], [0, 1], transform=ax.transAxes, ls="--", c="red")
plt.text(3, 20, string_score)
plt.title('Ridge model, small regularization')
plt.ylabel('Model predictions')
plt.xlabel('Truths')
plt.xlim([0, 27])
_ = plt.ylim([0, 27])

feature_names = (model.named_steps['columntransformer']
                      .named_transformers_['onehotencoder']
                      .get_feature_names(input_features=categorical_columns))
feature_names = np.concatenate(
    [feature_names, numerical_columns])

coefs = pd.DataFrame(
    model.named_steps['transformedtargetregressor'].regressor_.coef_,
    columns=['Coefficients'], index=feature_names
)

coefs.plot(kind='barh', figsize=(9, 7))
plt.title('Ridge model, small regularization')
plt.axvline(x=0, color='.5')
plt.subplots_adjust(left=.3)

X_train_preprocessed = pd.DataFrame(
    model.named_steps['columntransformer'].transform(X_train),
    columns=feature_names
)

X_train_preprocessed.std(axis=0).plot(kind='barh', figsize=(9, 7))
plt.title('Features std. dev.')
plt.subplots_adjust(left=.3)

coefs = pd.DataFrame(
    model.named_steps['transformedtargetregressor'].regressor_.coef_ *
    X_train_preprocessed.std(axis=0),
    columns=['Coefficient importance'], index=feature_names
)
coefs.plot(kind='barh', figsize=(9, 7))
plt.title('Ridge model, small regularization')
plt.axvline(x=0, color='.5')
plt.subplots_adjust(left=.3)

cv_model = cross_validate(
    model, X, y, cv=RepeatedKFold(n_splits=5, n_repeats=5),
    return_estimator=True, n_jobs=-1
)
coefs = pd.DataFrame(
    [est.named_steps['transformedtargetregressor'].regressor_.coef_ *
     X_train_preprocessed.std(axis=0)
     for est in cv_model['estimator']],
    columns=feature_names
)
plt.figure(figsize=(9, 7))
sns.swarmplot(data=coefs, orient='h', color='k', alpha=0.5)
sns.boxplot(data=coefs, orient='h', color='cyan', saturation=0.5)
plt.axvline(x=0, color='.5')
plt.xlabel('Coefficient importance')
plt.title('Coefficient importance and its variability')
plt.subplots_adjust(left=.3)

# end of scikit learn example

print('Download Train and Test Data.\n')
train = pd.read_csv('../input/train.csv', dtype=dtypes, low_memory=True)
train['MachineIdentifier'] = train.index.astype('uint32')
test  = pd.read_csv('../input/test.csv',  dtype=dtypes, low_memory=True)
test['MachineIdentifier']  = test.index.astype('uint32')

gc.collect()

print('Transform all features to category.\n')
for usecol in train.columns.tolist()[1:-1]:

    train[usecol] = train[usecol].astype('str')
    test[usecol] = test[usecol].astype('str')

    #Fit LabelEncoder
    le = LabelEncoder().fit(
            np.unique(train[usecol].unique().tolist()+
                      test[usecol].unique().tolist()))

    #At the end 0 will be used for dropped values
    train[usecol] = le.transform(train[usecol])+1
    test[usecol]  = le.transform(test[usecol])+1

    agg_tr = (train
              .groupby([usecol])
              .aggregate({'MachineIdentifier':'count'})
              .reset_index()
              .rename({'MachineIdentifier':'Train'}, axis=1))
    agg_te = (test
              .groupby([usecol])
              .aggregate({'MachineIdentifier':'count'})
              .reset_index()
              .rename({'MachineIdentifier':'Test'}, axis=1))

    agg = pd.merge(agg_tr, agg_te, on=usecol, how='outer').replace(np.nan, 0)
    #Select values with more than 1000 observations
    agg = agg[(agg['Train'] > 1000)].reset_index(drop=True)
    agg['Total'] = agg['Train'] + agg['Test']
    #Drop unbalanced values
    agg = agg[(agg['Train'] / agg['Total'] > 0.2) & (agg['Train'] / agg['Total'] < 0.8)]
    agg[usecol+'Copy'] = agg[usecol]

    train[usecol] = (pd.merge(train[[usecol]], 
                              agg[[usecol, usecol+'Copy']], 
                              on=usecol, how='left')[usecol+'Copy']
                     .replace(np.nan, 0).astype('int').astype('category'))

    test[usecol]  = (pd.merge(test[[usecol]], 
                              agg[[usecol, usecol+'Copy']], 
                              on=usecol, how='left')[usecol+'Copy']
                     .replace(np.nan, 0).astype('int').astype('category'))

    del le, agg_tr, agg_te, agg, usecol
    gc.collect()

y_train = np.array(train['HasDetections'])
train_ids = train.index
test_ids  = test.index

del train['HasDetections'], train['MachineIdentifier'], test['MachineIdentifier']
gc.collect()

print("If you don't want use Sparse Matrix choose Kernel Version 2 to get simple solution.\n")

print('--------------------------------------------------------------------------------------------------------')
print('Transform Data to Sparse Matrix.')
print('Sparse Matrix can be used to fit a lot of models, eg. XGBoost, LightGBM, Random Forest, K-Means and etc.')
print('To concatenate Sparse Matrices by column use hstack()')
print('Read more about Sparse Matrix https://docs.scipy.org/doc/scipy/reference/sparse.html')
print('Good Luck!')
print('--------------------------------------------------------------------------------------------------------')

#Fit OneHotEncoder
ohe = OneHotEncoder(categories='auto', sparse=True, dtype='uint8').fit(train)

#Transform data using small groups to reduce memory usage
m = 100000
train = vstack([ohe.transform(train[i*m:(i+1)*m]) for i in range(train.shape[0] // m + 1)])
test  = vstack([ohe.transform(test[i*m:(i+1)*m])  for i in range(test.shape[0] // m +  1)])
save_npz('train.npz', train, compressed=True)
save_npz('test.npz',  test,  compressed=True)

del ohe, train, test
gc.collect()

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
skf.get_n_splits(train_ids, y_train)

lgb_test_result  = np.zeros(test_ids.shape[0])
lgb_train_result = np.zeros(train_ids.shape[0])
#xgb_test_result  = np.zeros(test_ids.shape[0])
#xgb_train_result = np.zeros(train_ids.shape[0])
counter = 0

print('\nLightGBM\n')

for train_index, test_index in skf.split(train_ids, y_train):

    print('Fold {}\n'.format(counter + 1))

    train = load_npz('train.npz')
    X_fit = vstack([train[train_index[i*m:(i+1)*m]] for i in range(train_index.shape[0] // m + 1)])
    X_val = vstack([train[test_index[i*m:(i+1)*m]]  for i in range(test_index.shape[0] //  m + 1)])
    X_fit, X_val = csr_matrix(X_fit, dtype='float32'), csr_matrix(X_val, dtype='float32')
    y_fit, y_val = y_train[train_index], y_train[test_index]

    del train
    gc.collect()

    lgb_model = lgb.LGBMClassifier(max_depth=-1,
                                   n_estimators=30000,
                                   learning_rate=0.05,
                                   num_leaves=2**12-1,
                                   colsample_bytree=0.28,
                                   objective='binary', 
                                   n_jobs=-1)

    #xgb_model = xgb.XGBClassifier(max_depth=6,
    #                              n_estimators=30000,
    #                              colsample_bytree=0.2,
    #                              learning_rate=0.1,
    #                              objective='binary:logistic', 
    #                              n_jobs=-1)

    lgb_model.fit(X_fit, y_fit, eval_metric='auc', 
                  eval_set=[(X_val, y_val)], 
                  verbose=100, early_stopping_rounds=100)

    #xgb_model.fit(X_fit, y_fit, eval_metric='auc', 
    #              eval_set=[(X_val, y_val)], 
    #              verbose=1000, early_stopping_rounds=300)

    lgb_train_result[test_index] += lgb_model.predict_proba(X_val)[:,1]
    #xgb_train_result[test_index] += xgb_model.predict_proba(X_val)[:,1]

    del X_fit, X_val, y_fit, y_val, train_index, test_index
    gc.collect()

    test = load_npz('test.npz')
    test = csr_matrix(test, dtype='float32')
    lgb_test_result += lgb_model.predict_proba(test)[:,1]
    #xgb_test_result += xgb_model.predict_proba(test)[:,1]
    counter += 1

    del test
    gc.collect()

    #Stop fitting to prevent time limit error
    #if counter == 3 : break

print('\nLigthGBM VAL AUC Score: {}'.format(roc_auc_score(y_train, lgb_train_result)))
#print('\nXGBoost VAL AUC Score: {}'.format(roc_auc_score(y_train, xgb_train_result)))

submission = pd.read_csv('../input/sample_submission.csv')
submission['HasDetections'] = lgb_test_result / counter
submission.to_csv('lgb_submission.csv', index=False)
#submission['HasDetections'] = xgb_test_result / counter
#submission.to_csv('xgb_submission.csv', index=False)
#submission['HasDetections'] = 0.5 * lgb_test_result / counter  + 0.5 * xgb_test_result / counter 
#submission.to_csv('lgb_xgb_submission.csv', index=False)

print('\nDone.')

import pytz
from datetime import datetime

# assuming now contains a timezone aware datetime
pactz = pytz.timezone('America/Los_Angeles')
loc_dt = pactz.localize(datetime(2019, 10, 27, 6, 0, 0))
utcnow = pytz.utc
print(pytz.all_timezones)
dt = datetime(2019, 10, 31, 23, 30)
print (pactz.utcoffset(dt, is_dst=True))

def do_plotly():
    import plotly.graph_objs as go
    fig = go.Figure()
    fig.add_scatter
bschnurr commented 1 year ago

I'll remove the bundled stubs when the next version of lightGBM, with type annotations, is released.

jameslamb commented 1 year ago

Ok sure, thanks! Sorry, I would have probably been watching that microsoft/python-type-stubs repo if I'd known about it.

If you're interested in improving LightGBM's typing (or anything else), we'd also welcome any contributions you'd like to make here and would be happy to help with the process.

bschnurr commented 1 year ago

my stubs where generated using pylance/pyright by adding # pyright: reportMissingTypeStubs=true https://microsoft.github.io/pyright/#/type-stubs?id=generating-type-stubs

bschnurr commented 1 year ago

you can also use pyright to verify your type completeness of your public api (--verifytypes ). https://microsoft.github.io/pyright/#/command-line?id=pyright-command-line-options

Avasam commented 1 year ago

@bschnurr With the release of 4.0.0, it seems LightGBM's types seem complete enough to obsolete https://github.com/microsoft/python-type-stubs/tree/main/stubs/lightgbm-stubs . Not only has a lot of variables been removed or renamed (ie: a handful are no longer public), the only area I can see the inline type hints not bein on par, at a glance, are some non-public method parameters using the Any type (they're not part of the public API anyway, os whatever) and the lack of ndarray generic type in https://github.com/microsoft/LightGBM/blob/master/python-package/lightgbm/basic.py compared to https://github.com/microsoft/python-type-stubs/blob/main/stubs/lightgbm-stubs/basic.pyi . Which feels acceptable to me.

@jameslamb Are you aware of any major issue with the state of type hints in the currently published version?

jameslamb commented 1 year ago

Sorry for the delayed response. It appears that @bschnurr just went ahead and removed those stubs about 2 weeks ago: https://github.com/microsoft/python-type-stubs/pull/294

So I guess this discussion about whether or not they should be removed can be closed.

the lack of ndarray generic type

We'd welcome a pull request fixing this if you're interested in contributing!

Are you aware of any major issue with the state of type hints in the currently published version?

I'm not aware of any in the public API that are incorrect. There are certainly some cases where the type hints could be more specific (e.g. where they're using implicit Any, the numpy topic you mentioned). We'd welcome contributions on #3756 and #3867 if it's something your interested in improving.