Closed LingshuHu closed 2 years ago
Thanks for your submission @LingshuHu! We editors are discussing. We are specifically considering the privacy implications of this package. If you have any input on this topic, please let us know.
Dear Maëlle,
Thank you for your feedback! In terms of privacy, 1) this package only collects publicly available data from the website. On this website, users need to fill out their names, nicknames, and other information. Users' real names are not visible, and users have control of showing information such as "about me," "my groups," "my replies," and "my discussions." This package only scrapes users' nicknames and other publicly visible information. By using users' nicknames, people cannot infer the URLs of users' profile pages and get their personal information. 2) We have added a responsibility disclaimer in README and vignette. 3) We are open to changes if reviewers suggest additional things to be done.
Please let us know if you have any other questions or comments. Thank you for your time and consideration!
Warm regards, Lingshu
:wave: @LingshuHu! Sorry for the delay, we're not forgetting this submission: we're in the process of formulating a general policy on this so we can be consistent and provide guidance to reviewers on the subject. Thanks for your patience!
Dear Maëlle,
I hope all is well. I would just like to check the status of our paper. Is there any further information that we could provide or any improvement that we can make? Thank you!
Regards, Lingshu
Based on this approach, here are my thoughts on the healthforum package: It clearly accesses generates personally identifiable and sensitive data. Looking at all the relevant ToC/privacy/etc on the site, the normative expectations of privacy are quite ambiguous. I think any research using the package would require a form of informed consent or a evaluation of such expectation, be it discussion with forum managers or a survey of users. Since this is pretty much any use of the package, I think it makes sense to ask you, the package authors, to do this so as to provide appropriate information to users. This would mean contacting the site administrators to get their guidance on the use of the package and include that prominently in documentation. It would also make sense to feature workflows in the vignette that removed personally identifying information. For instance, you could show an example where you pull text and then generate analysis of frequently used terms, prominently noting that no user-level data is retained and this would be appropriate for publishable analysis.
@noamross:
I believe the healthforum package only has access to screen names. So to clarify: is the policy of ropensci that all 'user names' are considered to be personally identifiable data?
The GDPR defines personal data as
‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
I think the above paragraph could be interpreted that way, but my understanding is that an online identifier counts as personal data if it can be used to identify a natural person. This makes sense if the user names are email addresses. Or if user names are used in combination with IP addresses or linked to actual names. But I worry what kind of effect the broader interpretation would have on research–or what it means to software like {rtweet} that was accepted in the past.
I responded over at https://github.com/ropensci/dev_guide/pull/251/ so as to consolidate conversation.
⚠️⚠️⚠️⚠️⚠️
In the interest of reducing load on reviewers and editors as we manage the COVID-19 crisis, rOpenSci is temporarily pausing new submissions for software peer review for 30 days (and possibly longer). Please check back here again after 17 April for updates.
In this period new submissions will not be handled, nor new reviewers assigned. Reviews and responses to reviews will be handled on a 'best effort' basis, but no follow-up reminders will be sent.
Other rOpenSci community activities continue. We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other.
The rOpenSci Editorial Board
⚠️⚠️⚠️⚠️⚠️
⚠️⚠️⚠️⚠️⚠️
In the interest of reducing load on reviewers and editors as we manage the COVID-19 crisis, rOpenSci is temporarily pausing new submissions for software peer review for 30 days (and possibly longer). Please check back here again after 17 April for updates.
In this period new submissions will not be handled, nor new reviewers assigned. Reviews and responses to reviews will be handled on a 'best effort' basis, but no follow-up reminders will be sent.
Other rOpenSci community activities continue. We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other.
The rOpenSci Editorial Board
⚠️⚠️⚠️⚠️⚠️
⚠️⚠️⚠️⚠️⚠️ In the interest of reducing load on reviewers and editors as we manage the COVID-19 crisis, rOpenSci new submissions for software peer review are paused.
In this period new submissions will not be handled, nor new reviewers assigned. Reviews and responses to reviews will be handled on a 'best effort' basis, but no follow-up reminders will be sent. Other rOpenSci community activities continue.
Please check back here again after 25 May when we will be announcing plans to slowly start back up.
We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other.
The rOpenSci Editorial Board ⚠️⚠️⚠️⚠️⚠️
Hello @LingshuHu annd @mkearney, my apologies that this had fallen somewhat through the cracks somewhat without resolution. We started review activities up a few weeks ago but I failed to pick up this conversation as it had spread across multiple repositories.
We adopted the policy that we discussed above and it is at https://devguide.ropensci.org/policies.html#ethics-data-privacy-and-human-subjects-research . Our take on healthforum is that, since the vast majority of research uses would require users to obtain a form of informed consent, the package authors should facilitate this by obtaining either blanket approval from patients.info, or, perhaps more realistically, appropriate contact info and a procedure for engaging with them and the user community and document this (e.g., "For informed consent procedures for using patients.info forum data, contact XXXX, community manager.")
Please let us know what your status is. Again, sorry this took so long to pick up again.
Dear @noamross ,
Thank you for updating! We really appreciate your work during this hard time! We think providing users with contact information would be a great idea. We have included it in README.Rmd and also remind users to contact their local IRBs to get more detailed information about privacy policies. We also created a package startup message containing this information. Whenever users library our package, they will see it.
Could you please let us know what we should do next? Would you suggest that we release a new version of the CRAN package first?
Dear @LingshuHu, my deep apologies that I did not respond to this previously. Packages that had been underway during our "pause" fell through my alerts.
As I had written in my comment above, though, our opinion is that a disclaimer as included in the README isn't sufficient here, because, since the predominant uses of this package would require an informed consent procedure, doing the legwork of contacting the patients.info and obtaining either blanket approval or at least the relevant contact info and procedure, as I described above, is appropriate.
We realize this is a high standard and would be open to another option, such as including an IRB approval showing how certain uses can be approved without direct engagement with patients.info.
Dear @LingshuHu I wanted to ping to check if you are planning on continuing this review given our response above. I've placed the "holding" tag on it for now.
Dear @noamross Thank you for reaching out to me! Yes, we want to continue to work on it. We plan to apply for an IRB review. But currently, I'm occupied by my dissertation and graduation. I will let you know when we get a chance to go through the IRB review.
Hi @LingshuHu ! We are doing a sweep of stale review issues. Since this review has been open and inactive for so long, much may have changed including author, editor, and reviewer bandwidth and ever-evolving rOpenSci best practices. As such, I'm closing this issue. If you still have interest and capacity, we would welcome you to open a new submission issue!
Submitting Author Name: Lingshu Hu Submitting Author Github Handle: !--author1-->@LingshuHu<!--end-author1-- Other Package Authors Github handles: (comma separated, delete if none) !--author-others-->@mkearney<!--end-author-others-- Repository: https://github.com/LingshuHu/healthforum
Version submitted: 0.1.0
Editor: TBD Reviewers: TBD
Archive: TBD
Version accepted: TBD
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
Data retrieval: this package is used to download data from patient.info forum.
Data extraction: the data obtained from Patient forum is unstructured, including text, date, behavior traces, etc.
Data munging: this package parses the unstructured data to dataframes.
Who is the target audience and what are scientific applications of this package?
Health communication scholars or researchers who are interested in user-generated health information.
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
To our best knowledge, there is no other R package designed to obtain data from patient.info forum.
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Technical checks
Confirm each of the following by checking the box. This package:
Publication options
JOSS Options
- [x] The package has an **obvious research application** according to [JOSS's definition](https://joss.readthedocs.io/en/latest/submitting.html#submission-requirements). - [x] The package contains a `paper.md` matching [JOSS's requirements](https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain) with a high-level description in the package root or in `inst/`. - [x] The package is deposited in a long-term repository with the DOI: - (*Do not submit your package separately to JOSS*)MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal. - [ ] The manuscript describing the package is no longer than 3000 words. - [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct