smenon8 / AnimalWildlifeEstimator

Animal Wildlife Estimator Using Social Media (A.W.E.S.O.M.E.) is an ongoing project and stems mainly from Sreejith Menon's MS thesis
https://smenon8.github.io/AnimalWildlifeEstimator/
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Repeating attribute during classifier training #9

Closed smenon8 closed 8 years ago

smenon8 commented 8 years ago

Commit #4d04ef9 This is a temporary fix to the issue where the adult attribute was repeated for multiple times in train_x.

To recreate the issue,

def trainTestSplitter(gidAttribDict,allAttribs,trainTestSplit):
    df = pd.DataFrame(gidAttribDict).transpose()
    df = df[allAttribs + ["TARGET"]] # Rearranging the order of the columns

    attributes = df.columns[:len(allAttribs)] # all attributes

    dataFeatures = df[list(set(attributes))] # remove the type-casting to recreate the issue
    targetVar = df['TARGET']

    return train_test_split(dataFeatures, targetVar, test_size=trainTestSplit,random_state=0)

The number of training attributes is 88 while testing attributes are just 86. It is caused by repeating adults.

This was verified.

smenon8 commented 8 years ago

This issue is mainly due to set() in dataFeatures = df[list(set(attributes))]