michalovadek / eurlex

An R package for retrieving official data on European Union laws and policies
https://michalovadek.github.io/eurlex/
35 stars 4 forks source link

SPARQL query by directory code (CC) #15

Closed wrmadsen closed 3 years ago

wrmadsen commented 3 years ago

Like the EUR-Lex expert search, is it possible to add a directory code (CC) argument to the elx_make_query() function?

This would be incredibly useful for finding all legal acts in a larger policy area. For example, in tracking a country's EU defence policy, you would need to find all acts relating to Common Foreign and Security Policy (CC = 18).

On the expert search function of the EUR-Lex website, you are able to find EU legal acts by directory code, which is very useful for finding acts within larger areas, e.g. Common Foreign and Security Policy (CC = 18). I have attached a screenshot of this below.

EUR-Lex Expert Search: https://eur-lex.europa.eu/expert-search-form.html

EUR-Lex Expert Search
michalovadek commented 3 years ago

I agree that this would be useful but I don't see how to readily implement this, as the expert search code does not translate into SPARQL code in an obvious way. The fastest way forward is to email the Eur-Lex helpdesk and ask them to write the query (either for SPARQL or REST) for you. They are usual very helpful. I can probably implement the answer into the package if you post it here.

wrmadsen commented 3 years ago

Awesome. Last week, I asked the EUR-Lex team about exactly that. Today, they came back with the following example of a SPARQL query that is restricted to a given directory code.

From what I can see, it is the VALUES (in the beginning) and the UNION (at the end) parts of the SPARQL code which restrict our query to a specific directory code. In this example, the directory code is 18, i.e. Common Foreign and Security Policy.

prefix cdm: <http://publications.europa.eu/ontology/cdm#> select distinct ?s where { 
VALUES (?value)
{ (<http://publications.europa.eu/resource/authority/fd_555/18>)
  (<http://publications.europa.eu/resource/authority/dir-eu-legal-act/18>)
}
?s cdm:work_date_document ?dd.
FILTER(str(?dd) >= '1993-11-01')
FILTER(str(?dd) <= '2017-04-27')
?s cdm:resource_legal_id_sector ?sector.
FILTER(str(?sector)='2' or str(?sector)='3' or str(?sector)='4')
?s cdm:resource_legal_repertoire ?rep.
FILTER(str(?rep)='REP')
?s cdm:resource_legal_in-force ?inforce.

{?s cdm:resource_legal_is_about_concept_directory-code ?value. 
}
c
{?s cdm:resource_legal_is_about_concept_directory-code ?cc.
?value skos:narrower+ ?cc.
}
}

They also mentioned that this query will work when the team expects to change their system later this year. They wrote: _"We have some plans to migrate our data from the atto table http://publications.europa.eu/resource/authority/fd_555 to http://publications.europa.eu/resource/authority/dir-eu-legal-act which indeed would affect your query."_

michalovadek commented 3 years ago

I added this functionality to the package. Can I please ask you to test out the new features (as listed here: https://michalovadek.github.io/eurlex/news/index.html) after installing the github development version of the package via remotes::install_github("michalovadek/eurlex")? I will push it to CRAN once I am convinced that everything more or less works as it should.

michalovadek commented 3 years ago

sorry, the updated changelog is only here for now: https://github.com/michalovadek/eurlex/blob/master/NEWS.md

wrmadsen commented 3 years ago

It works perfectly! Great job.

The only thing that may be slightly confusing for new users is that, for example, elx_make_query(directory = "18") %>% elx_run_query() only returns two documents (whereas the example below returns 11,688 documents as of 9 March). I assume that is because the default is resource_type = "directive". I noticed that another new feature of your update is being able to select resource_type = "any". It may not make sense to set "any" as the default due to the potential computational load, but then I think it would make sense to print a warning that informs a user that the default is "directive".

Again, great work!

query_18 <- elx_make_query(resource_type = "any", directory = "18")

results_18 <- elx_run_query(query = query_18)

results_18

Also, users will probably not know the numeric code (e.g. 18) referring to a given directory code, so it might make it easier if this was somehow implemented (perhaps in the vignette?). I might have missed that, though, if you have already added it.

michalovadek commented 3 years ago

I am glad it works. I might force users to select a resource type explicitly to avoid any confusion.

For the directory codes we should make a lookup table with labels.