maximtrp / scikit-posthocs

Multiple Pairwise Comparisons (Post Hoc) Tests in Python
https://scikit-posthocs.rtfd.io
MIT License
351 stars 40 forks source link

function outliers_gesd has a bug when outliers > 1 #25

Closed zhoul14 closed 4 years ago

zhoul14 commented 5 years ago

Describe the bug outliers_gesd has a bug ,when the outliers increase and abs_d numpy array size decrease;

in file '_outliers.py' line

      # Masked values
        lms = ms[-1] if len(ms) > 0 else []
        ms.append(lms + [np.argmax(abs_d)])

the abs_d's size is not data's size any more, so the np.argmax(abs_d) is not the true outlier index in data numpy array.

maximtrp commented 5 years ago

@zhoul14 Please give a minimal working example (dataset and code you are getting errors with).

zhoul14 commented 5 years ago

Minimal Working Example: a=[0.0758620672966,0.0973375093743,0.0823115474934,0.0778925139413,0.0819465429785,0.0796656581338,0.0785154747212,0.076680191776,0.0768629885181,0.0788061349247,0.0779293561256,0.0790246794075,0.0790269516254,0.0777317636504,0.0757962833886,0.0775519895172,0.0778238646216,0.0780491246774,0.0794335135563,0.0759235310095,0.0833858503532,0.0768906781928,0.080384079316,0.0766948770573,0.0800948111699,0.082430089598,0.0740328629397,0.0789000913681,0.0779414300755,0.0790291666155,0.078657255494,0.0802241085545,0.0739809571224,0.076056244802,0.0761171781977,0.0728913220625,0.0744615341682,0.0725346394267,0.0743544717507,0.0709101944836,0.0752602311179,0.0785679418376,0.0725486099901,0.077237473364,0.0714355286539,0.0734376552041,0.0743690764601,0.0778208712667,0.0748293049788,0.0733981127026,0.0744273593654,0.0728530104913,0.0701381212573,0.0707073842833,0.068084805764,0.0734187555235,0.0726690312918,0.0651494324964,0.0673265119361,0.064044664499,0.0521803145495,0.0523705612868,0.0532560177287,0.0508246024279,0.0455519004764,0.045351524404,0.0456084475144,0.0695967875796,0.0694068269889,0.0683948681249,0.0663860249727,0.0656443483654,0.0684094562818,0.0693893045484,0.069333656528,0.0652543864573,0.0665306292816,0.0640296728562,0.0632387311432,0.0625893952856,0.0651539956474,0.0666448362681,0.0666099544756,0.0685935665265,0.0665430130162,0.0635760584895,0.0643642459146,0.0650119206937,0.0462430980888,0.0654321468831,0.066845471534]


def show_outliers():
    a=[0.0758620672966,0.0973375093743,0.0823115474934,0.0778925139413,0.0819465429785,0.0796656581338,0.0785154747212,0.076680191776,0.0768629885181,0.0788061349247,0.0779293561256,0.0790246794075,0.0790269516254,0.0777317636504,0.0757962833886,0.0775519895172,0.0778238646216,0.0780491246774,0.0794335135563,0.0759235310095,0.0833858503532,0.0768906781928,0.080384079316,0.0766948770573,0.0800948111699,0.082430089598,0.0740328629397,0.0789000913681,0.0779414300755,0.0790291666155,0.078657255494,0.0802241085545,0.0739809571224,0.076056244802,0.0761171781977,0.0728913220625,0.0744615341682,0.0725346394267,0.0743544717507,0.0709101944836,0.0752602311179,0.0785679418376,0.0725486099901,0.077237473364,0.0714355286539,0.0734376552041,0.0743690764601,0.0778208712667,0.0748293049788,0.0733981127026,0.0744273593654,0.0728530104913,0.0701381212573,0.0707073842833,0.068084805764,0.0734187555235,0.0726690312918,0.0651494324964,0.0673265119361,0.064044664499,0.0521803145495,0.0523705612868,0.0532560177287,0.0508246024279,0.0455519004764,0.045351524404,0.0456084475144,0.0695967875796,0.0694068269889,0.0683948681249,0.0663860249727,0.0656443483654,0.0684094562818,0.0693893045484,0.069333656528,0.0652543864573,0.0665306292816,0.0640296728562,0.0632387311432,0.0625893952856,0.0651539956474,0.0666448362681,0.0666099544756,0.0685935665265,0.0665430130162,0.0635760584895,0.0643642459146,0.0650119206937,0.0462430980888,0.0654321468831,0.066845471534]
    from matplotlib import pyplot as plt
    plt.plot(a)
    outliers = outliers_gesd(np.array(a),outliers=10)
    for i,x_i in enumerate(a):
        if x_i not in outliers:
            plt.plot(i,x_i,'r*')
    plt.show()
zhoul14 commented 5 years ago

image

zhoul14 commented 5 years ago

when i change the code:

ms.append(lms + [np.argmax(abs_d)])

to

ms.append(lms + np.where(data == data_proc[np.argmax(abs_d)])[0].tolist())

or

        ms.append(lms + [data.tolist().index(data_proc[np.argmax(abs_d)])])
zhoul14 commented 5 years ago

The outlier point changed: image

maximtrp commented 4 years ago

Sorry for a delayed answer. Thank you! You are right. I've corrected the wrong code.