science-collective / scoping-review

A scoping review of open collaboration within scientific research
2 stars 2 forks source link

Noise in pubmed-search.R #35

Closed MarioGuCBMR closed 1 year ago

MarioGuCBMR commented 1 year ago

Not the most important thing, but after going through the output of pubmed-search.R, I have noticed that with the current query we get lots of noise from medical and technological papers. Here are some examples:

"Oxygen vacancies in open-hollow microcapsule enable accelerated kinetics for stable Li-S battery." "Predicting major complications in patients undergoing laparoscopic and open hysterectomy for benign indications."

Clearly, the word open is used for many more topics than we were expecting. Those two seem like an exception, but the amount of noise is actually quite big and, though unavoidable to some extent, there might be some ways of mitigating it. If you think it is necessary, we can use this issue to highlight terms that are found more than once and that can be used in query.

For instance, I have seen more than twice the following terms:

"open fracture" "open-label"

lwjohnst86 commented 1 year ago

We could restrict to exclude keywords related to surgeon and other similar keywords we don't want?

danielibsen commented 1 year ago

This is what comes up when you dont have an MD on the team ;). I think there are at least two things to this. One is that there are more "open" things out in the medical literature, as you mention. Another is that there are very few studies on the topic as well.

My suggestion is to restrict the title to include the term "open." At least we get a lot less noise, but we also risk loosing some studies. However, if the focus is on open collaboration, I would think it is very likely to be in the title as it is a good feature to highlight.