hexiangnan / neural_factorization_machine

TenforFlow Implementation of Neural Factorization Machine
467 stars 186 forks source link

Load Data Code Wrong Badly #13

Open shm007g opened 4 years ago

shm007g commented 4 years ago

https://github.com/hexiangnan/neural_factorization_machine/blob/master/LoadData.py#L47

In the read_features() function, you just init a dict, and record the feature names and the first user which get this feature!! Nonsense!! Lost of Data.

This is a preview of ml-tag.test.libfm file:

-1.0 51798:1 2473:1 37583:1
-1.0 66335:1 61344:1 29842:1
-1.0 89085:1 60033:1 47050:1
1.0 61293:1 8073:1 3903:1
-1.0 81335:1 56575:1 50067:1
-1.0 65166:1 48181:1 12510:1
-1.0 75300:1 26027:1 38510:1
1.0 10219:1 2122:1 383:1
1.0 80855:1 80856:1 24728:1
1.0 67033:1 721:1 19495:1

I rewrite the code and test on upper file, and clearly you lost many data! Wrong Badly.

def read_features(file): # read a feature file
    features = {}
    i = len(features)
    with open(file) as f:
        for line in f:
            items = line.strip().split(' ')
            for item in items[1:]: # ['51798:1', '2473:1', '37583:1']
                if item not in features:
                    features[item] = i
                    i = i + 1
                else:
                    print('nfm load code error', i, item)
    return features