sdobber / FluxArchitectures.jl

Complex neural network examples for Flux.jl
MIT License
122 stars 15 forks source link

Implementing TPALSTM using own data #49

Closed marthinkondjeni closed 1 year ago

marthinkondjeni commented 1 year ago

I am trying to implement TPALSTM using my own data [link](https://www.kaggle.com/datasets/shenba/time-series-datasets/discussion?select=sales-of-shampoo-over-a-three-ye.csv). How can I go about it?

sdobber commented 1 year ago

I don't have access to the dataset you mention, but you can follow along the following code

using FluxArchitectures
using DataFrames
using CSVFiles

src = "https://storage.googleapis.com/mledu-datasets/california_housing_train.csv"
data = DataFrame(load(src))
target = :median_house_value
select!(data, Cols(target, :))  # make median_house_value the first column

poollength = 30
datalength = 5000
horizon = 10
features, labels = prepare_data(data, poollength, datalength, horizon; normalise=false)

This creates features and labels in the correct format for using the models.

Note that prepare_data expects the data that is supposed to be predicted in the first column, hence the sorting. Currently, there is an issue with normalising the data when using this function - see #50. I'm about to fix that.

marthinkondjeni commented 1 year ago
Sales of shampoo over a three year period | Month -- | -- 266.0 | "1-Jan" 145.9 | "1-Feb" 183.1 | "1-Mar" 119.3 | "1-Apr" 180.3 | "1-May" 168.5 | "1-Jun" 231.8 | "1-Jul" 224.5 | "1-Aug" 192.8 | "1-Sep" 122.9 | "1-Oct" more 646.9 | "3-Dec"

Here is a sample data on sales of shampoo over a three-year period. The aim is to predict the sales using the TPA-LSTM architecture. Here a code I used, but it seems I am getting errors when training the model.

using FluxArchitectures
using DataFrames
using CSVFiles

src = "sales-of-shampoo-over-a-three-ye.csv"
data = DataFrame(load(src))
target = "Sales of shampoo over a three year period"
select!(data, Cols(target, :))  # make median_house_value the first column

poollength = 30
datalength = 7
horizon = 10
features, labels = prepare_data(data, poollength, datalength, horizon; normalise=false)

inputsize = size(features, 1)
hiddensize = 10
layers = 2
filternum = 32
filtersize = 1
model = TPALSTM(inputsize, hiddensize, poollength, layers, filternum, filtersize)
function loss(x, y)
    Flux.ChainRulesCore.ignore_derivatives() do
        Flux.reset!(model)
    end
    return Flux.mse(model(x), permutedims(y))
end

Flux.train!(loss, Flux.params(model), Iterators.repeated((features, labels), 10),
    Adam(0.02))

but i am getting this error

MethodError: no method matching *(::Float32, ::String)
Closest candidates are:
*(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:591
*(::T, !Matched::T) where T<:Union{Float16, Float32, Float64} at float.jl:385
*(!Matched::Union{AbstractChar, AbstractString}, ::Union{AbstractChar, AbstractString}...) at strings/basic.jl:260

...
sdobber commented 1 year ago

The month data is in string format, which cannot be processed by the models in this repository. You need to do some feature engineering to convert it to a numerical value. You need to play with it yourself to figure out what works best, e.g. taking the month and converting that to 1-12, or giving each day its number of days from the start of the year.

marthinkondjeni commented 1 year ago

Thank you so much @sdobber