saeedkhaki92 / CNN-RNN-Yield-Prediction

This repository contains codes for the paper entitled "A CNN-RNN Framework for Crop Yield Prediction"
137 stars 58 forks source link

Predicting Future Yield with unknown future X_train values #12

Open mehathab96 opened 1 year ago

mehathab96 commented 1 year ago

Hello,

My name is Mehathab, I am an aspiring Data Scientist, I have gone through your paper and found it very interesting, I have made a new model using LSTM, GRU and CNN for the dataset, but now cannot understand how do we predict the future yield without the future x_train values?

also what is the sequencing you are using for the dataset creating, what i am doing is, my model will consider the last 10 data points for predicting yield,

Code i am using is :

def split_sequences(sequences, n_steps_in, n_steps_out):
    X, y = list(), list()
    for i in range(len(sequences)):
        # find the end of this pattern
        end_ix = i + n_steps_in
        out_end_ix = end_ix + n_steps_out - 1
        # check if we are beyond the dataset
        if out_end_ix > len(sequences):
            break
        # gather input and output parts of the pattern
        seq_x, seq_y = sequences[i:end_ix, 1:], sequences[end_ix - 1:out_end_ix, 0]
        X.append(seq_x)
        y.append(seq_y)
    return np.array(X), np.array(y)
where n_steps_in = 10, n_step_out = 5

since i am a new to Forecasting problems, can you help me with this ? is this correct?

rongtongxueya commented 1 year ago

same question.Personally, I think it is necessary to predict the year with the labeled data, so as to judge the accuracy of the prediction? If you have an answer, please let me know

mehathab96 commented 1 year ago

same question.Personally, I think it is necessary to predict the year with the labeled data, so as to judge the accuracy of the prediction? If you have an answer, please let me know

I think what could be done to is use the last 'n' data points to predict the future yield, for example using data from 2014-2018 to predict the yield on 2019 this can be achieved by time-lagged shift sequencing ( the assumption is that the yield produced in 2019 is directly dependent upon the data from 2014-2018 including the yield)

but for model creation and validation we need labelled data itself, once the model is perfected in a way, we could use this to predict the future values without having the actual future values

if you find a better approach/solution please comment