Closed YsylviaUC closed 1 year ago
Hi @YsylviaUC ! Thank you for your interest in our work.
I just pushed a commit that sets dstore_size
automatically for you, according to your training set size, if you just don't pass any value to this flag.
You will need to know this number when you load the saved datastore later.
You can find which size was saved according to the file name of the datastore. For example, if the saved file is called dstore_gpt2_116988150_768_vals.npy
, the size is 116988150
.
This number will also be printed when you save the datastore, as:
09/14/2022 11:01:00 - INFO - __main__ - [train] Total eval tokens: 116988150
Let me know if you have any questions or problems, Best, Uri
Hi @YsylviaUC ! Thank you for your interest in our work.
I just pushed a commit that sets
dstore_size
automatically for you, according to your training set size, if you just don't pass any value to this flag.You will need to know this number when you load the saved datastore later. You can find which size was saved according to the file name of the datastore. For example, if the saved file is called
dstore_gpt2_116988150_768_vals.npy
, the size is116988150
. This number will also be printed when you save the datastore, as:09/14/2022 11:01:00 - INFO - __main__ - [train] Total eval tokens: 116988150
Let me know if you have any questions or problems, Best, Uri
Hello, how long does it cost to Building the FAISS index using such big training vectors(>100G)?
It depends on the number of CPU cores. If you can use more, the code will use them.
I think a few hours.
Hi, I'm wondering how to set the _knn_args.dstoresize if I use my own data to construct the datastore?