llSourcell / ethereum_future

This is the Code for "Ethereum Future Prices" by Siraj Raval on Youtube
303 stars 180 forks source link

What data set should this model be used with? #1

Open ntrpnr opened 6 years ago

ntrpnr commented 6 years ago

In the referenced video, we are told that you can find a dataset on Kaggle. The datasets which can be found there does not contain all the columns that this notebook requires.

So where can we find a dataset that can be used together with this model?

zhivko commented 6 years ago

Also tried with kaggle dataset - I am getting:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-18-c25ebc6ecaf6> in <module>()
----> 1 X_train, Y_train, X_test, Y_test, Y_daybefore, unnormalized_bases, window_size = load_data("Bitcoin Data.csv", 50)
      2 print (X_train.shape)
      3 print (Y_train.shape)
      4 print (X_test.shape)
      5 print (Y_test.shape)

<ipython-input-8-c56ba76693f8> in load_data(filename, sequence_length)
     36     #Normalizing data by going through each window
     37     #Every value in the window is divided by the first value in the window, and then 1 is subtracted
---> 38     d0 = np.array(result)
     39     dr = np.zeros_like(d0)
     40     dr[:,1:,:] = d0[:,1:,:] / d0[:,0:1,:] - 1

MemoryError: 

Please post url of datasource where you got csv from.

darevolution commented 6 years ago

@llSourcell Could you please share the link to data source used to train this model ?

murchie85 commented 6 years ago

Same thing all the time with these code links in the vid - says 'its easy to find any data'. But there are tons of CSV files on Kaggle, and most of them don't work (no doubt for a reason). Siraj needs to specify a little more than, check out this code its easy to use. FYI should also have mentioned its for Python2 not python3.

triestpa commented 6 years ago

All of the mentioned fields can be retrieved from (or computed using data from), a bevy of free data sources.

We'll have to do a bit more legwork to get the data formatted correctly, but perhaps to fully understand how the network configuration / preprocessing works it can be valuable to reconfigure the existing code for a custom dataset.

I can't find any CSV online either that 100% matches the specified schema, but hey, sometimes building/cleaning your own dataset can be half the fun.

simonhughes22 commented 6 years ago

I'd like to see the data too. Many we can collaborate to build that dataset unless the author can provide the code to do so. I think most of the value in this approach is from the dataset and not the modeling techniques, although RNN's are powerful for time series prediction. But right now I am much more interested in the data.

esemve commented 6 years ago

Please post a valid csv example, because it don't work... :( I will create my own dataset, but what is the correct schema?

Shaitender-Intg commented 6 years ago

Dataset for training the model or post the correct data schema for the same.

triestpa commented 6 years ago

The tutorial dataset schema is specified in the Step 1 notebook cell -


The columns of data and their definitions are as follows:


I imagine that the model can still be trained effectively on different schemas too - but you may have to adjust the shape of the tensor depending on the number of features.

Check the code near this comment for reference -

#Convert the data to a 3D array (a x b x c) 
#Where a is the number of days, b is the window size, and c is the number of features in the data file
issxjl2015 commented 6 years ago

where are Dataset?

zhivko commented 6 years ago

url of dataset please.

zhaosongyi commented 6 years ago

please provide the datasets, thanks

davecerr commented 6 years ago

yeah as above....can you please provide dataset? thank you.

simonhughes22 commented 6 years ago

Note you can use any dataset quite easily in his code, it is mostly generic. The main part you'd have to be aware of is the index of the BTC prices. They seem to be 20 in his code - look where he gets the labels in y_train. So if you change that index to match the index in your own data the rest of the code should work with whatever dataset you want.

esemve commented 6 years ago

TypeError Traceback (most recent call last)

in () ----> 1 model = initialize_model(window_size, 0.2, 'linear', 'mse', 'adam') 2 print(model.summary()) in initialize_model(window_size, dropout_value, activation_function, loss_function, optimizer) 18 19 #First recurrent layer with dropout ---> 20 model.add(Bidirectional(LSTM(window_size, return_sequences=True), input_shape=(window_size, X_train.shape[-1]),)) 21 model.add(Dropout(dropout_value)) 22 /usr/local/lib/python3.5/dist-packages/keras/models.py in add(self, layer) 278 else: 279 input_dtype = None --> 280 layer.create_input_layer(batch_input_shape, input_dtype) 281 282 if len(layer.inbound_nodes) != 1: /usr/local/lib/python3.5/dist-packages/keras/engine/topology.py in create_input_layer(self, batch_input_shape, input_dtype, name) 368 # and create the node connecting the current layer 369 # to the input layer we just created. --> 370 self(x) 371 372 def assert_input_compatibility(self, input): /usr/local/lib/python3.5/dist-packages/keras/engine/topology.py in __call__(self, x, mask) 485 '`layer.build(batch_input_shape)`') 486 if len(input_shapes) == 1: --> 487 self.build(input_shapes[0]) 488 else: 489 self.build(input_shapes) /usr/local/lib/python3.5/dist-packages/keras/layers/wrappers.py in build(self, input_shape) 228 229 def build(self, input_shape): --> 230 self.forward_layer.build(input_shape) 231 self.backward_layer.build(input_shape) 232 /usr/local/lib/python3.5/dist-packages/keras/layers/recurrent.py in build(self, input_shape) 708 self.W_o, self.U_o, self.b_o] 709 --> 710 self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o]) 711 self.U = K.concatenate([self.U_i, self.U_f, self.U_c, self.U_o]) 712 self.b = K.concatenate([self.b_i, self.b_f, self.b_c, self.b_o]) /usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in concatenate(tensors, axis) 716 return tf.sparse_concat(axis, tensors) 717 else: --> 718 return tf.concat(axis, [to_dense(x) for x in tensors]) 719 720 /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py in concat(values, axis, name) 1045 ops.convert_to_tensor(axis, 1046 name="concat_dim", -> 1047 dtype=dtypes.int32).get_shape( 1048 ).assert_is_compatible_with(tensor_shape.scalar()) 1049 return identity(values[0], name=scope) /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, preferred_dtype) 649 name=name, 650 preferred_dtype=preferred_dtype, --> 651 as_ref=False) 652 653 /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype) 714 715 if ret is None: --> 716 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) 717 718 if ret is NotImplemented: /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref) 174 as_ref=False): 175 _ = as_ref --> 176 return constant(v, dtype=dtype, name=name) 177 178 /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name, verify_shape) 163 tensor_value = attr_value_pb2.AttrValue() 164 tensor_value.tensor.CopyFrom( --> 165 tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape)) 166 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype) 167 const_tensor = g.create_op( /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py in make_tensor_proto(values, dtype, shape, verify_shape) 365 nparray = np.empty(shape, dtype=np_dt) 366 else: --> 367 _AssertCompatible(values, dtype) 368 nparray = np.array(values, dtype=np_dt) 369 # check to them. /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py in _AssertCompatible(values, dtype) 300 else: 301 raise TypeError("Expected %s, got %s of type '%s' instead." % --> 302 (dtype.name, repr(mismatch), type(mismatch).__name__)) 303 304 TypeError: Expected int32, got of type 'Variable' instead
TingALin commented 6 years ago

@simonhughes22 I agree with you. I test the codes with a 4 features dataset from Kaggle, they work. However, other than some codes questions, I am wondering how many days of those codes can predict and how can you see them? Will be appreciated if I can get your take on this.

calvinchankf commented 6 years ago

hi @TingALin, which dataset did u use? this https://www.kaggle.com/mczielinski/bitcoin-historical-data/data ?

TingALin commented 6 years ago

@calvinchankf yes, but only with open, close, high, low as the features for testing

Sa6a commented 6 years ago

i used this dataset which include all features that Raval was talking about. But my statistic is following: Precision: 0.5376884422110553 Recall: 0.5783783783783784 F1 score: 0.5572916666666665 Mean Squared Error: 0.217757766929

TingALin commented 6 years ago

@calvinchankf do you know how to print out the predicted price btw? cause I don't see the predicted price from the code.

calvinchankf commented 6 years ago

@TingALin no i gave up trying this sample cos i think using future features(bi-direction) is kind of unrealistic to predict future prices. so i ended up studying other samples