skojaku / degree-corrected-link-prediction-benchmark

Link prediction
MIT License
3 stars 0 forks source link

fraction parameter issue #4

Closed rachithaiyappa closed 1 year ago

rachithaiyappa commented 1 year ago

Having added a bunch of new networks in #3, I was testing the rest of the pipeline.

When processing the file which currently resides under data/derived/networks/raw/power/edge_table.csv

the rule generate_link_prediction_dataset: throws an error

Traceback (most recent call last):
  File "/data/sg/racball/link-prediction/.snakemake/scripts/tmpum_8udes.generate-link-prediction-dataset.py", line 46, in <module>
    model.fit(net)
  File "/data/sg/racball/link-prediction/libs/linkpred/linkpred/LinkPredictionDataset.py", line 46, in fit
    self.splitter.fit(net)
  File "/data/sg/racball/link-prediction/libs/linkpred/linkpred/LinkPredictionDataset.py", line 179, in fit
    raise Exception(
Exception: Cannot remove edges by keeping the connectedness. Decrease the `fraction` parameter
Error in rule generate_link_prediction_dataset:
    jobid: 17
    input: data/derived/networks/raw/power/edge_table.csv
    output: data/derived/datasets/power/train-net_negativeEdgeSampler~uniform_testEdgeFraction~0.5_sampleId~3.npz, data/derived/datasets/power/targetEdgeTable_negativeEdgeSampler~uniform_testEdgeFraction~0.5_sampleId~3.csv

I haven't looked into LinkPredictionDataset.py much but any idea?

Decreasing fraction parameter is possible but is that what we want to do?

skojaku commented 1 year ago

There is no way to get around this error. Since the network is extremely sparse, it is impossible to remove the specified fraction of edges while keeping the network stay connected.

Did you see the same error for other networks? You can keep Snakemake going even if errors arise by setting -k option. Let's run the script and find out how many networks raise the error.

rachithaiyappa commented 1 year ago

I didn't check for many networks. I'm currently running it with the -k option. We'll see how prominent this is once it is done.

rachithaiyappa commented 1 year ago

Snakemake done. It ran for 93 out of the 98 networks (generated stuff inside datasets, embedding,link-prediction, and results/auc-roc directories). However, the result_auc_roc.csv was not updated inside results/auc-roc

Not sure why. Need to check. I guess snakemake probably didn't execute plot_aucroc rule because it is separate from the rest and since the rule all failed in 5 out of the 98 networks because of the fraction parameter issue

skojaku commented 1 year ago

So quick! Thank you so much! I don't know why, but we can force it to be updated by removing result_auc_roc.csv

skojaku commented 1 year ago

Oh, yeah. That's right. Since not all rules were successfully executed. I think 93/98 networks are fairly good. Let's remove the 5 networks and go with the fraction of 0.5.

rachithaiyappa commented 1 year ago

Okay. The figures are ready.

It is under derived/results in the shared data directory. Just skimmed through it and it seems like preferential attachment link pred with biased sampling does underperform uniform sampling always :)

I will work on summarising them later. I have some ideas for that but do feel free to suggest what kind of summary plot you'd like to see.

skojaku commented 1 year ago

Cool! I'd like to take a look at it. It seems you need to chmod the files for granting access?

rachithaiyappa commented 1 year ago

Sorry. You should be able to see it now

rachithaiyappa commented 1 year ago

Closing this issue. We chose to ignore the few networks where the fraction parameter issue arises.