What data set should this model be used with?

ntrpnr commented 6 years ago

In the referenced video, we are told that you can find a dataset on Kaggle. The datasets which can be found there does not contain all the columns that this notebook requires.

So where can we find a dataset that can be used together with this model?

zhivko commented 6 years ago

Also tried with kaggle dataset - I am getting:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-18-c25ebc6ecaf6> in <module>()
----> 1 X_train, Y_train, X_test, Y_test, Y_daybefore, unnormalized_bases, window_size = load_data("Bitcoin Data.csv", 50)
      2 print (X_train.shape)
      3 print (Y_train.shape)
      4 print (X_test.shape)
      5 print (Y_test.shape)

<ipython-input-8-c56ba76693f8> in load_data(filename, sequence_length)
     36     #Normalizing data by going through each window
     37     #Every value in the window is divided by the first value in the window, and then 1 is subtracted
---> 38     d0 = np.array(result)
     39     dr = np.zeros_like(d0)
     40     dr[:,1:,:] = d0[:,1:,:] / d0[:,0:1,:] - 1

MemoryError:

Please post url of datasource where you got csv from.

darevolution commented 6 years ago

@llSourcell Could you please share the link to data source used to train this model ?

murchie85 commented 6 years ago

Same thing all the time with these code links in the vid - says 'its easy to find any data'. But there are tons of CSV files on Kaggle, and most of them don't work (no doubt for a reason). Siraj needs to specify a little more than, check out this code its easy to use. FYI should also have mentioned its for Python2 not python3.

triestpa commented 6 years ago

All of the mentioned fields can be retrieved from (or computed using data from), a bevy of free data sources.

We'll have to do a bit more legwork to get the data formatted correctly, but perhaps to fully understand how the network configuration / preprocessing works it can be valuable to reconfigure the existing code for a custom dataset.

I can't find any CSV online either that 100% matches the specified schema, but hey, sometimes building/cleaning your own dataset can be half the fun.

simonhughes22 commented 6 years ago

I'd like to see the data too. Many we can collaborate to build that dataset unless the author can provide the code to do so. I think most of the value in this approach is from the dataset and not the modeling techniques, although RNN's are powerful for time series prediction. But right now I am much more interested in the data.

esemve commented 6 years ago

Please post a valid csv example, because it don't work... :( I will create my own dataset, but what is the correct schema?

Shaitender-Intg commented 6 years ago

Dataset for training the model or post the correct data schema for the same.

triestpa commented 6 years ago

The tutorial dataset schema is specified in the Step 1 notebook cell -

The columns of data and their definitions are as follows:

Annual Hash Growth: Growth in the total network computations over the past 365 days
Block Height: The total number of blocks in the blockchain
Block Interval: Average amount of time between blocks
Block Size: The storage size of each block (i.e. megabytes)
BlockChain Size: The storage size of the blockchain (i.e. gigabytes)
Daily Blocks: Number of blocks found each day
Chain Value Density: The value of bitcoin's blockchain, in terms of dollars per megabyte
Daily Transactions: The number of transactions included in the blockchain per day
Difficulty: The minimum proof-of-work threshold required for a bitcoin miner to mine a block
Fee Percentage: Average fee paid as a percentage of transaction volume
Fee Rate: Average fee paid per transaction
Two-Week Hash Growth: Growth in the total network computations over the past 14 days
Hash Rate: The number of block solutions computed per second by all miners
Market Capitalization: The market value of all bitcoin in circulation
Metcalfe's Law - TX: A variant of Metcalfe's Law in which price is divided by n log n number of daily transactions
Metcalfe's Law - UTXO: A variant of Metcalfe's Law in which price is divided by n log n number of unspent transaction outputs
Miner Revenue Value: The amount of dollars earned by the mining network
Miner Revenue: The amount of bitcoin earned by the mining network, in the form of block rewards and transaction fees
Money Supply: The amount of bitcoin in circulation
Output Value: The dollar value of all outputs sent over the network
Output Volume: The amount of Bitcoin sent over the network
Bitcoin Price: The amount of dollars a single bitcoin is worth
Quarterly Hash Growth: Growth in the total network computations in the past 90 days
Total Transactions: The running total number of transactions processed by the Bitcoin network
Transaction Amount: The average amount of bitcoin moved per transaction
Fees Value: The dollar value of mining fees
Transaction Fees: The amount of bitcoin paid to miners in fees
Transaction Size: The average data size of a transaction
Transaction Value: The average dollar value moved in each transaction
Transactions per Block: The number of transactions in each block
Average UTXO Amount: The average amount of bitcoin contained in each unspent transaction output
UTXO Growth: The net number of unspent transaction outputs created
UTXO Set Size: The total number of unspent transaction outputs
Average UTXO Value: The average dollar value of each uspent transaction output
Velocity - Daily: The proportion of the money supply transacted each day
Velocity - Quarterly: The proportion of the money supply transacted each day, computed on a rolling-quarter basis
Velocity of Money: How many times the money supply changes hands in a given year

I imagine that the model can still be trained effectively on different schemas too - but you may have to adjust the shape of the tensor depending on the number of features.

Check the code near this comment for reference -

#Convert the data to a 3D array (a x b x c) 
#Where a is the number of days, b is the window size, and c is the number of features in the data file

issxjl2015 commented 6 years ago

where are Dataset?

zhivko commented 6 years ago

url of dataset please.

zhaosongyi commented 6 years ago

please provide the datasets, thanks

davecerr commented 6 years ago

yeah as above....can you please provide dataset? thank you.

simonhughes22 commented 6 years ago

Note you can use any dataset quite easily in his code, it is mostly generic. The main part you'd have to be aware of is the index of the BTC prices. They seem to be 20 in his code - look where he gets the labels in y_train. So if you change that index to match the index in your own data the rest of the code should work with whatever dataset you want.

esemve commented 6 years ago

TypeError Traceback (most recent call last)

in () ----> 1 model = initialize_model(window_size, 0.2, 'linear', 'mse', 'adam') 2 print(model.summary()) in initialize_model(window_size, dropout_value, activation_function, loss_function, optimizer) 18 19 #First recurrent layer with dropout ---> 20 model.add(Bidirectional(LSTM(window_size, return_sequences=True), input_shape=(window_size, X_train.shape[-1]),)) 21 model.add(Dropout(dropout_value)) 22 /usr/local/lib/python3.5/dist-packages/keras/models.py in add(self, layer) 278 else: 279 input_dtype = None --> 280 layer.create_input_layer(batch_input_shape, input_dtype) 281 282 if len(layer.inbound_nodes) != 1: /usr/local/lib/python3.5/dist-packages/keras/engine/topology.py in create_input_layer(self, batch_input_shape, input_dtype, name) 368 # and create the node connecting the current layer 369 # to the input layer we just created. --> 370 self(x) 371 372 def assert_input_compatibility(self, input): /usr/local/lib/python3.5/dist-packages/keras/engine/topology.py in __call__(self, x, mask) 485 '`layer.build(batch_input_shape)`') 486 if len(input_shapes) == 1: --> 487 self.build(input_shapes[0]) 488 else: 489 self.build(input_shapes) /usr/local/lib/python3.5/dist-packages/keras/layers/wrappers.py in build(self, input_shape) 228 229 def build(self, input_shape): --> 230 self.forward_layer.build(input_shape) 231 self.backward_layer.build(input_shape) 232 /usr/local/lib/python3.5/dist-packages/keras/layers/recurrent.py in build(self, input_shape) 708 self.W_o, self.U_o, self.b_o] 709 --> 710 self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o]) 711 self.U = K.concatenate([self.U_i, self.U_f, self.U_c, self.U_o]) 712 self.b = K.concatenate([self.b_i, self.b_f, self.b_c, self.b_o]) /usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in concatenate(tensors, axis) 716 return tf.sparse_concat(axis, tensors) 717 else: --> 718 return tf.concat(axis, [to_dense(x) for x in tensors]) 719 720 /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py in concat(values, axis, name) 1045 ops.convert_to_tensor(axis, 1046 name="concat_dim", -> 1047 dtype=dtypes.int32).get_shape( 1048 ).assert_is_compatible_with(tensor_shape.scalar()) 1049 return identity(values[0], name=scope) /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, preferred_dtype) 649 name=name, 650 preferred_dtype=preferred_dtype, --> 651 as_ref=False) 652 653 /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype) 714 715 if ret is None: --> 716 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) 717 718 if ret is NotImplemented: /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref) 174 as_ref=False): 175 _ = as_ref --> 176 return constant(v, dtype=dtype, name=name) 177 178 /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name, verify_shape) 163 tensor_value = attr_value_pb2.AttrValue() 164 tensor_value.tensor.CopyFrom( --> 165 tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape)) 166 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype) 167 const_tensor = g.create_op( /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py in make_tensor_proto(values, dtype, shape, verify_shape) 365 nparray = np.empty(shape, dtype=np_dt) 366 else: --> 367 _AssertCompatible(values, dtype) 368 nparray = np.array(values, dtype=np_dt) 369 # check to them. /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py in _AssertCompatible(values, dtype) 300 else: 301 raise TypeError("Expected %s, got %s of type '%s' instead." % --> 302 (dtype.name, repr(mismatch), type(mismatch).__name__)) 303 304 TypeError: Expected int32, got of type 'Variable' instead

TingALin commented 6 years ago

@simonhughes22 I agree with you. I test the codes with a 4 features dataset from Kaggle, they work. However, other than some codes questions, I am wondering how many days of those codes can predict and how can you see them? Will be appreciated if I can get your take on this.

calvinchankf commented 6 years ago

hi @TingALin, which dataset did u use? this https://www.kaggle.com/mczielinski/bitcoin-historical-data/data ?

TingALin commented 6 years ago

@calvinchankf yes, but only with open, close, high, low as the features for testing

Sa6a commented 6 years ago

i used this dataset which include all features that Raval was talking about. But my statistic is following: Precision: 0.5376884422110553 Recall: 0.5783783783783784 F1 score: 0.5572916666666665 Mean Squared Error: 0.217757766929

TingALin commented 6 years ago

@calvinchankf do you know how to print out the predicted price btw? cause I don't see the predicted price from the code.

calvinchankf commented 6 years ago

@TingALin no i gave up trying this sample cos i think using future features(bi-direction) is kind of unrealistic to predict future prices. so i ended up studying other samples

llSourcell / ethereum_future

What data set should this model be used with? #1