laxmimerit / Human-Activity-Recognition-Using-Accelerometer-Data-and-CNN

Human Activity Recognition Using Accelerometer Data and CNN
23 stars 20 forks source link

Example to predict data without training #1

Open gsion1 opened 3 years ago

gsion1 commented 3 years ago

Hi, Thanks for this tutorial. Could you add an example describing how to predict a small batch of 80 x,y,z data with the saved model please? Thanks a lot Guillaume

gsion1 commented 3 years ago

I did that, but the confusion matrix shows that there is an error somewhere, maybe in the confusion matrix itself and not the prediction Screenshot from 2021-03-11 13-04-28

After the initila code, the confusion matrix is the one bellow (not a lot of errors) Screenshot from 2021-03-11 13-06-14

Many lines should be removed but here is the code

#https://kgptalkie.com/human-activity-recognition-using-accelerometer-data/
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import Conv2D, MaxPool2D
from tensorflow.keras.optimizers import Adam
#print(tf.__version__)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder

file = open('WISDM_ar_v1.1_raw.txt')
#file = open('jogging.txt')
lines = file.readlines()

processedList = []

for i, line in enumerate(lines):
    try:
        line = line.split(',')
        last = line[5].split(';')[0]
        last = last.strip()
        if last == '':
            break;
        temp = [line[0], line[1], line[2], line[3], line[4], last]
        processedList.append(temp)
    except:
        print('Error at line number: ', i)

columns = ['user', 'activity', 'time', 'x', 'y', 'z']
data = pd.DataFrame(data = processedList, columns = columns)
data.head()

data.shape
data.info()
data.isnull().sum()
data['activity'].value_counts()
data['x'] = data['x'].astype('float')
data['y'] = data['y'].astype('float')
data['z'] = data['z'].astype('float')
data.info()

Fs = 20 #sampling rate in Hz
activities = data['activity'].value_counts().index

df = data.drop(['user', 'time'], axis = 1).copy()
df.head()
df['activity'].value_counts()
label = LabelEncoder()
df['label'] = label.fit_transform(df['activity'])
df.head()

X = df[['x', 'y', 'z']]
y = df['label']

scaler = StandardScaler()
X = scaler.fit_transform(X)

scaled_X = pd.DataFrame(data = X, columns = ['x', 'y', 'z'])
scaled_X['label'] = y.values

scaled_X.head()

import scipy.stats as stats
#divide the samplings in 4s frames
Fs = 20 #sampling is 20Hz
frame_size = Fs*4 # 80samples, 4 secondes 
hop_size = Fs*2 # 40

def get_frames(df, frame_size, hop_size):

    N_FEATURES = 3

    frames = []
    labels = []
    for i in range(0, len(df) - frame_size, hop_size):
        x = df['x'].values[i: i + frame_size]
        y = df['y'].values[i: i + frame_size]
        z = df['z'].values[i: i + frame_size]

        # Retrieve the most often used label in this segment
        label = stats.mode(df['label'][i: i + frame_size])[0][0]
        frames.append([x, y, z])
        labels.append(label)

    # Bring the segments into a better shape
    frames = np.asarray(frames).reshape(-1, frame_size, N_FEATURES)
    labels = np.asarray(labels)

    return frames, labels

X, y = get_frames(scaled_X, frame_size, hop_size)
#still match with the right labels
X.shape, y.shape
X = X.reshape(X.shape[0],80,3,1)
X.shape

import keras
model = keras.models.load_model('model.h5')

from mlxtend.plotting import plot_confusion_matrix
from sklearn.metrics import confusion_matrix

print(model.predict(X))
y_pred = np.argmax(model.predict(X), axis=-1)
print(y_pred)

label_list = ["Walking", "Jogging", "Upstairs", "Downstairs", "Sitting", "Standing"]

draw_mat = 1
if draw_mat:
    mat = confusion_matrix(y, y_pred)
    plot_confusion_matrix(conf_mat=mat, class_names=label_list, show_normed=True, figsize=(7,7))`
laxmimerit commented 3 years ago

Seems like the model is overfitting if it is for training data otherwise it's a good model.

guillaume55 commented 3 years ago

I'm not sure to understand what you are saying ? Because the model used in the second script is the the model saved from the original script with the training and testing

Do you have an idea to solve the issue?

I've tried with 10, 15 and 50 epochs but the results are not satisfying for my case in all the cases

superuser28 commented 3 years ago

Admittedly, I did not check the details of the dataset from the turorial, but my question would be: Did you record your test data under the same circumstances as the dataset from the tutorial (i.e. sensor position, etc.)? You want the trained model to classify your own test data right? If your data was recorded in a different manner, then I guess you have to retrain the model on your own data first.

laxmimerit commented 3 years ago

This shows only the process of model traning. You have to train on your dataset for correct result.

On Mon, 5 Apr 2021 at 12:32 AM, kobe28 @.***> wrote:

Admittedly, I did not check the details of the dataset from the turorial, but my question would be: Did you record your test data under the same circumstances as the dataset from the tutorial (i.e. sensor position, etc.)? You want the trained model to classify your own test data right? If your data was recorded in a different manner, then I guess you have to retrain the model on your own data first.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/laxmimerit/Human-Activity-Recognition-Using-Accelerometer-Data-and-CNN/issues/1#issuecomment-813083648, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7QW5BZ5SBEOHN5P6QF6N3THCZSZANCNFSM4Y6G4IIQ .

gsion1 commented 3 years ago

Admittedly, I did not check the details of the dataset from the turorial, but my question would be: Did you record your test data under the same circumstances as the dataset from the tutorial (i.e. sensor position, etc.)? You want the trained model to classify your own test data right? If your data was recorded in a different manner, then I guess you have to retrain the model on your own data first.

I did not used my own files but the ones from the tutorial, so this cannot be the source of the issue I think

superuser28 commented 3 years ago

I did not used my own files but the ones from the tutorial, so this cannot be the source of the issue I think

Ah okay, so you took the whole dataset from the course and then you ran the model on it. We should excpect the model to do very well, but it did not. I did not check for mistakes in your code, but the confusion matrix looks odd indeed. The classes are very unbalanced and this makes sense since you took the whole dataset, but initially "Sitting" was the smallest class with 3555 data points if I recall correctly, but in this case "Downstairs" and "Upstairs" are considerably smaller. Maybe you got the data somehow mixed up?

gsion1 commented 3 years ago

This is what I think but I'm not able to spot the issue Labels may also be mixed up but I can't find the error

Because the tutorial and my code use the same model, the confusion matrix and the predictions should be far better, this issue might appear when displaying the results