Closed Bisaloo closed 6 years ago
Then maybe the easiest thing to do would be to search for all exact titles if we found them at first. However it seems that sometimes even if the title is returned can be absent from RoMEO:
library("rromeo")
rr_journal_name("Evolutionary", qtype = "contains", multiple = FALSE) -> h
#> Warning in parse_answer(api_answer, multiple = multiple): 43 journals match your query terms.
#> Warning in parse_answer(api_answer, multiple = multiple): Select one
#> journal from the provided list or enable multiple = TRUE
lapply(h$title, function(x) rr_journal_name(x, qtype = "exact"))
#> Error in parse_answer(api_answer, multiple = multiple): No journal matches your query terms. Please try another query.
Created on 2018-11-03 by the reprex package (v0.2.1)
We can get which journal has problem using the following snippet:
Giving the following result:
structure(list(title = c("Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference), author",
"Genetic and Evolutionary Computation Conference : [proceedings] / sponsored by ACM SIGEVO. Genetic and Evolutionary Computation Conference",
"Journal of Evolutionary Biochemistry and Physiology / Zhurnal Evolyutsionnoi Biokhimii i Fiziologii",
"Journal of social, evolutionary & cultural psychology : JSEC",
"Proceedings of the Genetic and Evolutionary Computation Conference / GECCO. Genetic and Evolutionary Computation Conference"
), issn = c(NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), preprint = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), postprint = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), pdf = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
romeocolour = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_)), class = "data.frame", row.names = c(NA,
-5L))
Journals with "/" in their names have their translations.
For example there is Journal of Evolutionary Biochemistry and Physiology / Zhurnal Evolyutsionnoi Biokhimii i Fiziologii
If we query the full name we get no results:
rr_journal_name("Journal of Evolutionary Biochemistry and Physiology / Zhurnal Evolyutsionnoi Biokhimii i Fiziologii", qtype = "exact")
#> Error in parse_answer(api_answer, multiple = multiple) :
#> No journal matches your query terms. Please try another query
While querying only the English name returns results:
rr_journal_name("Journal of Evolutionary Biochemistry and Physiology", qtype = "exact")
#> title
#>1 Journal of Evolutionary Biochemistry and Physiology / Zhurnal Evolyutsionnoi Biokhimii i Fiziologii
#> issn preprint postprint pdf romeocolour
#>1 0022-0930 unclear can unknown blue
So there doesn't seem to be a quick an easy solution to date... From the API docs it seems possible to query the API with ESSN. Do we get the ESSN back when looking multiple queries?
Do we get the ESSN back when looking multiple queries?
Hum, no, we don't :confused:
For the moment we could drop the journals that don't have ISSN with a warning. That would avoid the problems when using multiple = TRUE
Even adding a warning there are still problems because some journals have two entries in the database like the following http://www.sherpa.ac.uk/romeo/search.php?jtitle=evolution+psychiatrique&issn=0014-3855&zetocpub=Elsevier+Masson&romeopub=Elsevier&fIDnum=|&mode=simple&la=en&version=&source=journal&sourceid=10528
So when querying we get different warnings:
library("rromeo")
rr_journal_name("Évolution Psychiatrique", multiple = FALSE, qtype = "exact")
#> Warning in parse_answer(api_answer, multiple = multiple): 2 journals match
#> your query terms.
#> Warning in parse_answer(api_answer, multiple = multiple): Select one
#> journal from the provided list or enable multiple = TRUE
#> title issn
#> 1 Évolution Psychiatrique 0014-3855
rr_journal_name("Évolution Psychiatrique", multiple = TRUE, qtype = "exact")
#> Warning in parse_answer(api_answer, multiple = multiple): 2 journals match
#> your query terms.
#> Recursively fetching data from each journal. This may take some time...
#> Warning in parse_answer(api_answer, multiple = FALSE): 2 journals match
#> your query terms.
#> Warning in parse_answer(api_answer, multiple = FALSE): Select one journal
#> from the provided list or enable multiple = TRUE
#> title issn
#> 1 Évolution Psychiatrique 0014-3855
Created on 2018-11-05 by the reprex package (v0.2.1)
Nice catch!
We can actually find those edge cases by parsing the outcome
field. In the case of issn=0014-3855
, it returns uniqueZetoc
. For a "normal" single journal, it returns singleJournal
and for multiple journals, it returns manyJournals
.
Now, we need to ensure that xml_find_first
will return the correct policy in those cases.
http://www.sherpa.ac.uk/romeo/publishertypes.php?fIDnum=|&mode=simple&la=en&version=
I'm not really sure what that means in the case of 0014-3855
for example...
FYI: I'm going to prepare and push some commits for this.
Should we add a warning in this case :thinking:?
Well, if the user reached the limit that would still be important to know, right?
Hum, I'm not sure what you mean. I was thinking of a warning here:
https://github.com/Rekyt/rromeo/blob/494d453a4615ff2c196a023d0b7c7a3d15ed41d3/R/utils.R#L33-L37
saying something like: "this journal has multiple publishers with different policies. We tried to return the most relevant one but you should also check the detailed policy."
Woops I misunderstood! Haven't seen the second commit. Yep at least a message telling that we chose to return a single policy.
Fixed by PR #16
For example:
will return a dataframe of journals and some of them do not have a ISSN. So
multiple = TRUE
fails becausevalidate_issn()
complains that""
is not a valid ISSN.Maybe we should perform the search using the title in this case? We would need to check what happens with XML- or HTTP-encoded characters
Also, some journals with a missing ISSN still have a ESSN, maybe we can do something with this.