ArjunWhabi commented 2 years ago

Hello

I was reading your research paper and hen came scross this github library.

I understand that the train() command in the main.py trains the generator and discrimator. Then the model weights are saved by generator.saveweights('weights/%s/generator%i_%i.h5'%(timestamp,epoch,index))

But then after saving the trained model, how did you arrive at the NPY files in the generated_samples folder ?

stakahashy commented 1 year ago

@ArjunWhabi These numpy samples in generated_samples folder are saved while training as in main.py. You need to write another script to load the saved weight and run inference.

@W-Mrt I added the link to the dataset in README.md.

ArjunWhabi commented 1 year ago

@stakahashy How do you know which out of all the generated numpy samples is the best fit or is the dataset that has features most identical to the actual data generated?

stakahashy commented 1 year ago

@ArjunWhabi When I was writing the paper, I checked the statistical properties (stylized facts) of the generated samples saved as images.

ArjunWhabi commented 1 year ago

Hello @stakahashy do you think it would be valid to do a kolmogorov-smirnov-test for each generated synthetic sample and compare that to all the stocks in the SnP500 and take an average of the pvalue for each synthetic sample vs symbol and choose the sample with the lowest p value ?

Also the present model generates a time series with one column which is just the synthetic log returns. Do you think the GAN models were robust enough to generate a time series with 5 columns . Something like Log Return of Closing Log Return of Open Log Return of High Log Return of Low Log of Volume

How do you think we can adjust the model to generate data like this ?
I noticed you had tried to do something similar in your code under the data.py file :

` # self.data['U'] = np.log(self.data['High']/self.data['Open'])

self.data['D'] = np.log(self.data['Low']/self.data['Open'])

        # self.data['C'] = np.log(self.data['Close']/self.data['Open'])`

But then you commented out that code so was wondering what were your findings?

Sorry for all the questions you seem very knowledgable in the topic so I thought Id ask

stakahashy commented 1 year ago

@ArjunWhabi I considered what are the best tools to quantitatively measure the similarity between the observed and synthesized financial time-series. To my understanding, KS test is a statistical tool with minimum assumptions on the probability distribution, which can be applied to the GAN-generated samples. Yet, I thought that there are no solid theoretical foundations to apply KS test for the financial time-series, which has fat-tailed distribution and is auto-correlated and so I hesitated to present it as an academic work. Instead, I qualitatively analyzed whether the generated time-series has fat-tailed distribution via checking log-log plot of the probability distribution.

I am not sure about the second question because I wrote it more than 5 years ago. Probably I was working on the volatility prediction task and expected these features might be beneficial for better prediction. At least for the financial time-series synthesis, while the statistical properties of the log price return are investigated in detail, the values of U or D are not studied.

ArjunWhabi commented 1 year ago

@stakahashy You seem very Knowledgable on the Subject so I would like to ask for your advice. In your paper You are using GANS to generate Synthetic samples of 1 data point Close for 8192 Time intervals which have the shape (8192,) In my case If would like to generate Synthetic samples of Close,Open,High,Low,Volume, for 8192 time intervals. How do you suggest I go about it ? Will simply changing the shapes in the deep learning GANS network do the task ? Or any other suggestions?

stakahashy / fingan

How do you create the generated_samples? #12

self.data['D'] = np.log(self.data['Low']/self.data['Open'])