Every so often a publication from 6-18 months ago will appear among the new suggestions for a person whose publications we track religiously (reviewing once a week or more).
An example of this is user = smkamins and PMID = 33365208. That publication just popped up in the pending publication list yesterday, but it hit PubMed in 2019. skaminsk has only 112 candidate publications. The exact same thing happened for rgcryst.
Why didn't we capture it way earlier?
Possible causes
The following are possible causes:
"We don't look these individuals up frequently enough to surface this publication." → We review candidate pubs for skaminsk and rgcryst 1x/week or more.
"Clustering is to blame." → Unlikely. Both rgcryst and smkamins have few low-scoring publications among candidate articles
"These are low-scoring articles, and they just missed the cut-off." No - these are often high-scoring results.
"Something is wacky with the dates." → This seems most likely as I will explain....
PMID = 33365208 is one of the roughly 5% of articles in PubMed that have a dateAddedToEntrez greater than the dateAddedToPubmed.
Most of the time, the discrepancies between PubStatus="pubmed" vs. PubStatus="entrez" are minor, being off by a day or so. But, among that 5%, there is an even small subset of articles where it's months or more, and this is one of them.
When we do a date search using incremental lookup, our practice is to use the [DP] tag. For example:
("2020/12/28"[DP] : "2020/12/31"[DP]) AND kaminsky s[au]
The DP tag keys off of the date associated with PubStatus="entrez". Obviously, if it's null, we will miss out on these publications. In contrast, the [edat] tag keys off the PubStatus="pubmed". (It's confusing that it starts with an "e"!)
This returns zero results:
("1950/01/01"[DP] : "2019/01/01"[DP]) AND kaminsky s[au] AND 10.1080/21678707.2019.1684258[doi]
This returns one result:
("1950/01/01"[edat] : "2019/01/01"[edat]) AND kaminsky s[au] AND 10.1080/21678707.2019.1684258[doi]
Possible fixes
We could do one or both of these...
Confirm that when retrievalRefreshFlag = ALL_PUBLICATIONS is set, it does not depend on the "DP" field. I don't think we're doing this, but this is the only reason I can think for why our monthly recon captures older pubs.
Assuming it is not significantly slower, incremental lookups should do this...
(("2019/01/01"[edat] : "2019/01/02"[edat]) OR ("2019/01/01"[dp] : "2019/01/02"[dp])) AND kaminsky s[au]
Another possible fix, which I don't recommend. is to switch over to using the [EDAT] tag entirely. The reason why is that there are a subset of articles where the opposite problem is true.
For example, for 20228386, there is the following...
Problem
Every so often a publication from 6-18 months ago will appear among the new suggestions for a person whose publications we track religiously (reviewing once a week or more).
An example of this is user = smkamins and PMID = 33365208. That publication just popped up in the pending publication list yesterday, but it hit PubMed in 2019. skaminsk has only 112 candidate publications. The exact same thing happened for rgcryst.
Why didn't we capture it way earlier?
Possible causes
The following are possible causes:
PMID = 33365208 is one of the roughly 5% of articles in PubMed that have a dateAddedToEntrez greater than the dateAddedToPubmed.
Most of the time, the discrepancies between
PubStatus="pubmed"
vs.PubStatus="entrez"
are minor, being off by a day or so. But, among that 5%, there is an even small subset of articles where it's months or more, and this is one of them.When we do a date search using incremental lookup, our practice is to use the
[DP]
tag. For example:The
DP
tag keys off of the date associated withPubStatus="entrez"
. Obviously, if it's null, we will miss out on these publications. In contrast, the[edat]
tag keys off thePubStatus="pubmed"
. (It's confusing that it starts with an "e"!)This returns zero results:
This returns one result:
Possible fixes
We could do one or both of these...
retrievalRefreshFlag
=ALL_PUBLICATIONS
is set, it does not depend on the "DP" field. I don't think we're doing this, but this is the only reason I can think for why our monthly recon captures older pubs.Another possible fix, which I don't recommend. is to switch over to using the
[EDAT]
tag entirely. The reason why is that there are a subset of articles where the opposite problem is true.For example, for 20228386, there is the following...
That said, the difference in these cases tends to be of a day or less.