Open luzimu opened 3 years ago
Here's what I used:
data_file_name = "reviews_" + cat_type + ".json.gz"
model_file = "reviews_" + cat_type + ".d2v"
raw_df = pd.read_json(data_file_name, lines=True)
print("Data loaded")
seed = 2333
documents = []
for index, row in raw_df.iterrows():
documents.append(TaggedDocument(row.reviewText, [row.asin]))
model = Doc2Vec(documents, seed=seed, vector_size=300, window=7, min_count=3, workers=multiprocessing.cpu_count())
model.save(model_file)
Thank you for your generous reply, which helped me a lot. But I have met another error ---"No module named 'sampler_sampling'" when running the run_irgnn.py. I will really appreciate any insight on this.
You could directly use the NeighborSampler
class from pyg by
from torch_geometric.loader import NeighborSampler
I have fixed the code accordingly:)
I have a similar question: how can I use IRGNN to obtain the embeddings of each node and associate it with its respective asin identifier? Specifically, I need a a DataFrame containing the asin code as index and one column containing the embeddings (obtained after the training step of the model). Do you have any advice on how to achieve that? Any help is greatly appreciated.
Thanks
In datasets/process.py
:
line 107: I converted asin (node labels) to integers (new inner IDs) by the function nx.convert_node_labels_to_integers
;
line 65: I sorted the nodes by the assigned integers. Therefore, the obtained embeddings of IRGNN are also in the order of the assigned integers.
I think you can try to fetch the embeddings by firstly build the mapping between asin and their new inner IDs, and then get the embeddings accordingly.
Hi Your paper is great but I have 2 question. 1) Can your code use in Google Colab ? 2) How long have you train your data ? Thanks
Your great work helped me a lot. But we have met some problems in using gensim.models.doc2vec to extract node features. Can you show more details in using gensim.models.doc2vec? I will really appreciate any insight on this.