[JOSS Review] Comments on paper.md

From the JOSS checklist

Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?

The summary is hard to understand for a reader, who does not know the graph diffusion. I suggest do make the purpose a bit clearer, e.g. by putting the explanation what graph diffusion models are useful for ("simulating information spread") earlier and elaborating this more.

A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?

Here the basic problem (graph source localization) is nicely introduced in a way that also non experts can understand . Figure 1 is rather useful for that. The relation to others work becomes fairly clear, as you use other peoples software in a sort of unified framework. For me, it is however not clear who the target audience is, is it developers for other algorithms for graph source localization, that want to compare their solution to others, or is it users of such algorithms, that want to find out the most suitable algorithm for their work? What I am missing for both cases is some steps how to do that, i.e. how to incorporate a new algorithm, or a new dataset. With that also the "problems the software is designed to solve" is a bit unclear, technically it solves the graph source localization problem, but it remains unclear what the exact use of GraphSL is, that other softwares in this field do not cover.

State of the field: Do the authors describe how this software compares to other commonly-used packages?

As it includes other packages, this is straight forward. However, as explained earlier, the usefulness of this (and the datasets) does not become clear

Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?

The writing is generally of good quality. A few minor remarks:

line 27: "up-to-date state-of-the-art" seems redundant

in Methods and Benchmark Datasets the past tense is used, when describing other approaches, here present tense should be used.

References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

This looks mostly fine. I would suggest to add reference to a recent review paper or so explaining graph source localization (techniques), to help the reader to dive deeper in to the field. I feel like there are missing references for the data from table.

More general

line 33: "Independent Cascade" and "Linear Threshold" are not explained. references might also help
lines 33-36: the formulation here is not really clear. I would suggest to elaborate this further, This could go well with my earlier points
Figure 2:
- the image has some sort of gradient to the left, background should either be white or transparent
- the image description does not feel very useful. Suggestion: "The hierarchical structure of the GraphSL library. In total 6 algorithms are implemented, which can be devided into two categories: Prescribed Methods that rely on ... and GNN-based Methods which ..." From the image alone, along with the description the reader should be able to extract the most important information
- Table 1:
- Caption has to be improved
- The relevant information is lost in to much detail (two many numbers displayed, average Degree is redundant)
- right align numbers and reduce number of significant digits (two, maybe three)
- I would suggest to only have Nodes and Average Degree in there (average Degree needs to be fixed number of digits, as is, I would loose all digits after the decimal separator
- Line 55: There needs to be some sort of explanation of the datasets, what a seed deiffusion pair is and how to generate them.

Thank you for the review. Below is our revision:

Q1. The summary is hard to understand for a reader, who does not know the graph diffusion. I suggest do make the purpose a bit clearer, e.g. by putting the explanation what graph diffusion models are useful for ("simulating information spread") earlier and elaborating this more. A1. We have revised the summary to explain the graph diffusion, which is shown as follows:

We introduce GraphSL, a new library for studying the graph source localization problem. graph diffusion and graph source localization are inverse problems in nature: graph diffusion predicts information diffusions from information sources, while graph source localization predicts information sources from information diffusions. GraphSL facilitates the exploration of various graph diffusion models for simulating information diffusions and enables the evaluation of cutting-edge source localization approaches on established benchmark datasets. The source code of GraphSL is made available at Github Repository. Bug reports and feedback can be directed to the Github issues page.

Q2. For me, it is however not clear who the target audience is, is it developers for other algorithms for graph source localization, that want to compare their solution to others, or is it users of such algorithms, that want to find out the most suitable algorithm for their work? What I am missing for both cases is some steps how to do that, i.e. how to incorporate a new algorithm, or a new dataset. With that also the "problems the software is designed to solve" is a bit unclear, technically it solves the graph source localization problem, but it remains unclear what the exact use of GraphSL is, that other softwares in this field do not cover.

A2. (a). The target audience is both developers and practical users: For developers, they can add datasets and algorithms at their will. Instructions are given in the contact section in the readme file, which is shown as follows:

We welcome your contributions! If you’d like to contribute your datasets or algorithms, please submit a pull request consisting of an atomic commit and a brief message describing your contribution.

For a new dataset, please upload it to the data folder. The file should be a dictionary object saved by pickle. It contains a key "adj_mat" with the value of a graph adjacency matrix (sprase numpy array with the CSR format).

For a new algorithm, please determine whether it belongs to prescribed methods or GNN-based methods: if it belongs to the prescribed methods, add your algorithm as a new class in the GraphSL/Prescribed.py. Otherwise, please upload it as a folder under the GraphSL/GNN folder. Typically, the algorithm should include a "train" function and a "test" function, and the "test" function should return a Metric object.

Feel free to Email me (junxiang.wang@alumni.emory.edu) if you have any questions. Bug reports and feedback can be directed to the Github issues page.

For practical users, they can utilize our GraphSL library for their proposes. We have create a Jupyter notebook tutorial.ipynb to introduce the library usages.

(b) Other softwares in this field do not support various simulations of information diffusion, and they also miss real-world benchmark datasets and state-of-the-art source localization approaches. We have added it to the paper.

Q3. As it includes other packages, this is straight forward. However, as explained earlier, the usefulness of this (and the datasets) does not become clear. A3. We have create a Jupyter notebook tutorial.ipynb to introduce the library usages.

Q4. Some typos: line 27: "up-to-date state-of-the-art" seems redundant

in Methods and Benchmark Datasets the past tense is used, when describing other approaches, here present tense should be used.

A4. Thank you for pointing them out. We have fixed them in the paper.

Q5. This looks mostly fine. I would suggest to add reference to a recent review paper or so explaining graph source localization (techniques), to help the reader to dive deeper in to the field. I feel like there are missing references for the data from table.

A5. I have added the survey paper and missing reference to the data from table.

Q6. More general comments.

A6. We have added necessary references to explain "Independent Cascade" and "Linear Threshold" , we also elaborate the formulation, enhance figure 2 and Table 1. The explanations of datasets could be found in the Readme file.

Please let me know if you have any concerns, and we are happy to address them. Thanks.

xianggebenben / GraphSL

[JOSS Review] Comments on paper.md #13

From the JOSS checklist

More general