essepuntato commented 1 year ago

Dear @open-sci/pika-py,

Please find here attached my comments on all your material. You have to address all of them and, once finalised, close this issue with a comment containing your reply to each of the points I have highlighted. There is no specific deadline to complete this task; thus, please take your time.

Please, be aware that some modifications in some documents may also affect modifications in other documents. As a final note, please remember to keep your notebooks up-to-date.

After closing this issue, please remember to update your material.md file by specifying the references to the new version of all your documents.

As usual, for further doubts, do not hesitate to contact me in the Slack channel or just comment on this issue here.

General comments from the presentation

The following comments and questions should be addressed and may also result in the modification of some of the material prepared for the project:

Even if ERIH-PLUS is European, is it containing also international journals? Do you think that this may have biased the results somehow? Please justify this claim.
Please provide a clear view of the approximate time to run the two methodologies.
Have you measured how many journals in Meta are not covered by ERIH and are, to some extent, described as SSH journals in some other index? Please quantify, at a general level, how many of these journals you have lost in your analysis. Please, add this aspect as a limitation of the work.
Why did you need two methodologies?
Did you observe any inconsistencies in the results returned by the methodologies?

DMP

The document (PDF) of the DMP says "Version 4" but, in the metadata, it is "Version 5". Please correct.

Why did you use the Horizon 2020 template instead of the Horizon Europe one?

Title: Pika.py Dataset

Page 3: In the PDF, footnotes 1 and 2 do not point to anything (V3 and V19). A proper citation should be used. In addition, there is no citation of the ERIH-PLUS dataset.
2.1.2 Where do the described data reside? Maybe, pointing to the original container of the data (Figshare DOI links, for Meta and COCI) would be more appropriate. The opencitations.net links refer just to a website.
2.1.3 Which data will be re-used? It would be better here to describe which of the data you have will be reused. For instance, are you caring about the authors in Meta? Are you looking at the self-citation in COCI? What did you use really?
3.1.1.2 Please provide URL/Location describing the used metadata schema. Meta is not a metadata schema. The OCDM is the data model of Meta. Are you using it? And in addition, are you using a metadata schema for your data?
3.1.1.9 Will you provide clear version numbers for your data? You said so, but then you are not doing it (e.g. you used Version 1 etc. instead of Version 1.0.0).
3.1.1.13 What services will you use to provide searchable metadata? And how? Can you elaborate a bit more?
3.1.2.4 Please provide URL/Name of used data repositories. Isn't the repository Zenodo? Why you did specify the Pika.py Code?
3.1.2.5 Is the storage sufficiently secure for the data and does the storage provide backup and recovery procedures? are you sure about what you specified here?
3.1.4.4 What internationally recognised licence(s) will you use for the described data? The dataset is not released in ISC.
3.1.4.5 Do you have documented procedures for quality assurance of the described data? Semantic versioning is not a process for keeping quality assurance.

Title: Pika.py software

Some of the points arisen below apply also here. In addition:

3.1.1.16 Provide information about used standardised formats. Why is CSV mentioned even if it is software?

Protocol

Nothing to add; well done.

Software

The README.md does not contain an appropriate introduction on how I can pass the input to the software itself. For instance, I want to run your software again with more updated sources. How can I do it? Where do I have to put them? This must be clear reading the README.

Finally, the item on Zenodo describing the software is not linked via GitHub. You did not use the GitHub+Zenodo approach for uploading it on GitHub, as introduced during the lectures. This would guarantee that a new version of the software will also be assigned with version DOIs and automatically linked with the GitHub repository and the previous versions uploaded on Zenodo. Please, use the appropriate method to upload the software.

Data

The description of the data on Zenodo should detail which data I do expect to see in the various directory and how these data are structured. I know that this information is probably present in the protocol, but the format used for data (column names, their meaning, etc.) should also be reported here.

Article

Abstract: please use the same structured abstract already developed in the paper.

Introduction: it should include at least the research questions, and it should also contain, at the very end, how the rest of the paper is actually structured (e.g. "The rest of the paper is structured as follows. In Section 2, ...").

Methodology: The citation to the software must be done appropriately, i.e. creating a bibliographic reference with authors, year, title, version, and then the ID. You did not cite (no bibliographic references have been specified) the actual datasets for Meta, COCI, and ERIH that you have used in your analysis.

Results: This section is pretty small and does not contain any information in tabular forms, something that is expected. Please extend its description a bit, adding more information. In addition, usually, the graphs are specified here and just briefly described. While you should discuss them (providing a possible explanation about the results) in the...

Discussion: ... that should contain a discussion of the results, not new ones. In addition, they should also contain possible limitations of your work – e.g. the fact that ERIH is not comprehensive of all the possible SSH journals, the fact that books have not been considered (and in the humanities, they may have an impact), etc.

Conclusions: they are missing and should also contain some sketches of future works.

References: if you use the references (I agree), please cite them properly in the text without using footnotes, but using APA style.

SaraVell1 commented 1 year ago

General comments from the presentation

Even if ERIH-PLUS is European, is it containing also international journals? Do you think that this may have biased the results somehow? Please justify this claim. → In the “About” section of the Erih Plus website (https://kanalregister.hkdir.no/publiseringskanaler/erihplus/about/) is said that the main target group of the index are researchers and research within a European framework, but it holds also journals from other parts of the world to add value to the ERIH PLUS main target group and scope. We addressed this problem in the Discussion section, by highlighting the ERIH-PLUS team’s decision and the fact that, according to us, the inclusion of international journals has certainly biased the result. We also add information about this limitation in “Limitations of the work” subsection.

Please provide a clear view of the approximate time to run the two methodologies. → In “Methods and Methodologies”, in particular in “Requirement & Problems” subsection of the article, we added this information.

Have you measured how many journals in Meta are not covered by ERIH and are, to some extent, described as SSH journals in some other index? Please quantify, at a general level, how many of these journals you have lost in your analysis. Please, add this aspect as a limitation of the work. → We added a subsection in “Discussion” called “Limitations of the work” in which we addressed the fact that some META journals not classified by ERIH-PLUS dataset as part of the SSH domain have been found in another index (Scimago). We have provided detailed information about the number of journals not covered by ERIH and the two examples found.

Why did you need two methodologies? → We explained the need of having two methodologies in the first lines of the “Discussion” section. We offer the possibility to run a lighter methodology without the creation of additional files that, otherwise, can be produced and used for further research.

Did you observe any inconsistencies in the results returned by the methodologies? → We highlighted this information in the first lines of the “Discussion” section.

DMP

The document (PDF) of the DMP says "Version 4" but, in the metadata, it is "Version 5". Please correct. → We have released a new version with the correct metadata version. DOI: https://zenodo.org/record/8324973

Why did you use the Horizon 2020 template instead of the Horizon Europe one? → When we released the first version of the DMP we did not know that it was available a new and more updated version of the template. We have analyzed the differences between the two templates, but we decided to keep the Horizon 2020, even if it is deprecated, because the Horizon 2020 seems to be more detailed and seems to provide more information about our data. Furthermore, since the suggestions we received were based on the already published DMP we decided to stick to that template and try to release an improved version of it.

## Title: Pika.py Dataset

Page 3: In the PDF, footnotes 1 and 2 do not point to anything (V3 and V19). A proper citation should be used. In addition, there is no citation of the ERIH-PLUS dataset. → We have changed the description of the Pika.py Dataset and better explained the versions utilized in our research.

2.1.2 Where do the described data reside? Maybe, pointing to the original container of the data (Figshare DOI links, for Meta and COCI) would be more appropriate. The opencitations.net links refer just to a website. → We have added the appropriate Figshare DOI links for Meta and COCI. For Erih-Plus we didn’t find a different link.

2.1.3 Which data will be re-used? It would be better here to describe which of the data you have will be reused. For instance, are you caring about the authors in Meta? Are you looking at the self-citation in COCI? What did you use really? → We have tried to address these questions under the citation of the dataset by specifying the data that we use from each dataset.

3.1.1.2 Please provide URL/Location describing the used metadata schema. Meta is not a metadata schema. The OCDM is the data model of Meta. Are you using it? And in addition, are you using a metadata schema for your data? → This section doesn’t exist anymore because we don’t use a metadata schema and the information describing the dataset can be found in the keywords of the publication on Zenodo (explanation in Section 3.1.1.1 of the new version).

3.1.1.9 Will you provide clear version numbers for your data? You said so, but then you are not doing it (e.g. you used Version 1 etc. instead of Version 1.0.0). → We correct our description about the use of a versioning system, specifying the use of the versioning system provided by the tools, e.g. Zenodo.

3.1.1.13 What services will you use to provide searchable metadata? And how? Can you elaborate a bit more? → We don’t make use of searchable metadata, because we use the automatic cataloging for the publications based on the keywords provided.

3.1.2.4 Please provide URL/Name of used data repositories. Isn't the repository Zenodo? Why you did specify the Pika.py Code? → We misunderstood the statement, so now there is the correct repository link.

3.1.2.5 Is the storage sufficiently secure for the data and does the storage provide backup and recovery procedures? are you sure about what you specified here? → We tried to explain better the backup and recovery service provided by Zenodo in https://about.zenodo.org/infrastructure/ .

3.1.4.4 What internationally recognised licence(s) will you use for the described data? The dataset is not released in ISC. → We changed the license and inserted the correct one.

3.1.4.5 Do you have documented procedures for quality assurance of the described data? Semantic versioning is not a process for keeping quality assurance. → We changed our answer to “no”.

Title: Pika.py software

0.1.2 → We modified the description by adding some information in order to make it clearer.

2.1.2 → We added “GitHub” as repository for reused data.

2.1.3 → We added some information about the data reused.

3.1.1.2 → This section doesn’t exist anymore because we modified the previous one (3.1.1.1) providing all the correct information.

3.1.1.9 → We justify the choice of using the versioning system provided by Github through the use of tags.

3.1.1.12 → We have modified this section and we specified that we don’t provide searchable metadata.

3.1.1.16 Provide information about used standardised formats. Why is CSV mentioned even if it is software? → We have deleted the CSV format because the file that we produce are in python.

3.1.2.4 → We added the links for the repositories

3.1.2.5 → We addressed this issue by explaining how secure the tools we used are .

3.1.4.4 → We didn’t modify the license because we all decided it at course level.

3.1.4.5 → For the software we didn’t modify the section

Software

The README.md does not contain an appropriate introduction on how I can pass the input to the software itself. For instance, I want to run your software again with more updated sources. How can I do it? Where do I have to put them? This must be clear reading the README. → We modified the README.md by adding the information needed to run the software

Finally, the item on Zenodo describing the software is not linked via GitHub. You did not use the GitHub+Zenodo approach for uploading it on GitHub, as introduced during the lectures. This would guarantee that a new version of the software will also be assigned with version DOIs and automatically linked with the GitHub repository and the previous versions uploaded on Zenodo. Please, use the appropriate method to upload the software. → We uploaded the software in the correct way by linking the GitHub repository to Zenodo and publishing it through new releases. DOI: https://zenodo.org/record/8326023

Data The description of the data on Zenodo should detail which data I do expect to see in the various directory and how these data are structured. I know that this information is probably present in the protocol, but the format used for data (column names, their meaning, etc.) should also be reported here. → We provided a more accurate description of the data of both “DATA PRODUCED” (https://doi.org/10.5281/zenodo.7974816) and “DATA PREPROCESSED” (https://doi.org/10.5281/zenodo.7973159).

Article

Abstract: please use the same structured abstract already developed in the paper. → We structured the abstract as you suggested.

Introduction: it should include at least the research questions, and it should also contain, at the very end, how the rest of the paper is actually structured (e.g. "The rest of the paper is structured as follows. In Section 2, ..."). → We reformulated the research questions in a clearer way and we added a summary of the paper’s structure.

Methodology: The citation to the software must be done appropriately, i.e. creating a bibliographic reference with authors, year, title, version, and then the ID. You did not cite (no bibliographic references have been specified) the actual datasets for Meta, COCI, and ERIH that you have used in your analysis. → In “Rereferences” section, we have modified softwares’ citations and we also added datasets’ citations, where we cited the starting datasets.

Results: This section is pretty small and does not contain any information in tabular forms, something that is expected. Please extend its description a bit, adding more information. In addition, usually, the graphs are specified here and just briefly described. While you should discuss them (providing a possible explanation about the results) in the…→ We added a new table that summarizes the results of the first and the second research questions for a better comparison between the citing and cited disciplines. We modified the position of the graphs already created to improve the readability.

Discussion: ... that should contain a discussion of the results, not new ones. In addition, they should also contain possible limitations of your work – e.g. the fact that ERIH is not comprehensive of all the possible SSH journals, the fact that books have not been considered (and in the humanities, they may have an impact), etc. → We started the discussion by explaining the reasons why we have two methodologies. We provided a bit of previous literature, then we examined the results presented in the previous section. As suggested, we highlighted the fact that the results could be biased due to our source dataset. We tried to justify and discuss our results.

Conclusions: they are missing and should also contain some sketches of future works. → We tried to improve the conclusions by summarizing briefly the work done and by suggesting future research starting from our investigation.

References: if you use the references (I agree), please cite them properly in the text without using footnotes, but using APA style. → We modified the citations by using the APA style as suggested.

essepuntato commented 11 months ago

Fine with me. Thanks for all your revisions!

open-sci / 2022-2023

Revision of material - team Pika.py #28