wwliu555 / IRGNN_TNNLS_2021

6 stars 1 forks source link

About extracting node features #3

Open luzimu opened 3 years ago

luzimu commented 3 years ago

Your great work helped me a lot. But we have met some problems in using gensim.models.doc2vec to extract node features. Can you show more details in using gensim.models.doc2vec? I will really appreciate any insight on this.

wwliu555 commented 3 years ago

Here's what I used:

data_file_name = "reviews_" + cat_type + ".json.gz"
model_file = "reviews_" + cat_type + ".d2v"
raw_df = pd.read_json(data_file_name, lines=True)
print("Data loaded")

seed = 2333

documents = []
for index, row in raw_df.iterrows():
    documents.append(TaggedDocument(row.reviewText, [row.asin]))

model = Doc2Vec(documents, seed=seed, vector_size=300, window=7, min_count=3, workers=multiprocessing.cpu_count())
model.save(model_file)
luzimu commented 3 years ago

Thank you for your generous reply, which helped me a lot. But I have met another error ---"No module named 'sampler_sampling'" when running the run_irgnn.py. I will really appreciate any insight on this.

wwliu555 commented 3 years ago

You could directly use the NeighborSampler class from pyg by

from torch_geometric.loader import NeighborSampler

I have fixed the code accordingly:)

silva-vinicius commented 2 years ago

I have a similar question: how can I use IRGNN to obtain the embeddings of each node and associate it with its respective asin identifier? Specifically, I need a a DataFrame containing the asin code as index and one column containing the embeddings (obtained after the training step of the model). Do you have any advice on how to achieve that? Any help is greatly appreciated.

Thanks

wwliu555 commented 2 years ago

In datasets/process.py:

line 107: I converted asin (node labels) to integers (new inner IDs) by the function nx.convert_node_labels_to_integers; line 65: I sorted the nodes by the assigned integers. Therefore, the obtained embeddings of IRGNN are also in the order of the assigned integers.

I think you can try to fetch the embeddings by firstly build the mapping between asin and their new inner IDs, and then get the embeddings accordingly.

QuangLinh0301 commented 2 years ago

Hi Your paper is great but I have 2 question. 1) Can your code use in Google Colab ? 2) How long have you train your data ? Thanks