hbz / oerworldmap

OER World Map
https://oerworldmap.org/
Other
30 stars 16 forks source link

Analyse & import Hewlett data #256

Closed trugwaldsaenger closed 7 years ago

trugwaldsaenger commented 9 years ago

http://www.hewlett.org/active-oer-grantee-list This task is related to #236

literarymachine commented 9 years ago

This looks like something that can easily be put on the map. We would have to qualify the list a bit further, mostly by adding the locations of the Organizations. Given the rather small size of the data set, it should not pose a problem to do this manually.

literarymachine commented 9 years ago

Are there also similar lists of terminated grants? If so, would it make sense to also include those?

trugwaldsaenger commented 9 years ago

I completely agree that it would make sense to include also the terminated grants. Doing so would provide us with the chance to demonstrate the time line modul in a proper way!

literarymachine commented 9 years ago

Perhaps it would make more sense to simply scrape data from the grant database?

literarymachine commented 8 years ago

Also see https://github.com/hbz/oerworldmap/issues/236#issuecomment-119911125

literarymachine commented 8 years ago

@trugwaldsaenger mentioned that @drrobertfarrow also has a spreadsheet containing this data. Is that correct?

philboeselager commented 8 years ago

@trugwaldsaenger The site tells me "You are not authorized to access this page." Can you provide access, please?

philboeselager commented 8 years ago

ProgramMembership seems to represent a Hewlett grant best. There are the following matchings (<field in schema.org> : <field in Hewlett grant>) :

That leaves the following fields of Hewlett grants uncategorized:

whereas the following properties of ProgramMembership are not needed for now:

programm and region could be subsumed in disambiguatingDescription:

However, the important following informations require an extension of ProgramMembership :

There is no such extension for grants existing by now - at least I have not found any.

trugwaldsaenger commented 8 years ago

Do I see it right, that we are currently working with this list: http://www.hewlett.org/grants/search?order=field_date_of_award&sort=desc&keywords=OER&year=&term_node_tid_depth_1=All&program_id=148&page=4&page=0 ?

If @drrobertfarrow does not have any additional information, I will contact Hewlett again and ask them, if they do have a list with even more information.

trugwaldsaenger commented 8 years ago

many thanks @philboeselager for making this proposal. "ProgramMembership" seems to fit in many regards. But I do not understand yet, what this would mean for our data model. @acka47: Could you comment on this issue?

acka47 commented 8 years ago

I don't think that we should use schema:ProgramMembership as it is intended for quite separate things (like mile programs of airlines or other loyality programs like the German "payback").

Regardless of the vocabulary/properties being used, I think we should add the funding information to resources of type project/Action by extending the information we already have under the funder property. Here is an example for the second phase of the OER World Map project:

{
  "@id": "urn:uuid:43526b28-66f8-49bb-bf1c-c3cb8d4bbbcd",
  "@type": "Action",
  "name": [
    {
      "@value": "OER World Map",
      "@language": "en"
    }
  ],
  "funding": [
    {
      "amount": "$108,000",
      "awardDate": "2014-11-17",
      "duration": "17 months",
      "purpose": [
        {
          "@value": "For the development of an OER World Map",
          "@language": "en"
        }
      ],
      "description": [
        {
          "@value": "The North Rhine-Westphalian Library Service Center (hbz) is a central service and development organization for university libraries in North Rhine-Westphalia and has been a leader in the Open Movement. Through a Request for Proposals process, hbz submitted the strongest proposal to develop an OER World Map. This grant request builds upon a phase I effort (which funded three projects), in which hbz successfully built a prototype in a relatively short time frame. Key deliverables of this grant include: developing a functional website that builds upon existing OER mapping efforts, encouraging interoperability of mapping efforts, fostering international collaboration, and highlighting OER efforts around the world in a way that is easy for a wide variety of audiences to understand. This grant is designed to fill a long-recognized gap in the OER field.",
          "@language": "en"
        }
      ],
      "funder": [
        {
          "@id": "urn:uuid:0801e4d4-3c7e-11e5-9f0e-54ee7558c81f",
          "@type": "Organization",
          "name": [
            {
              "@value": "William and Flora Hewlett Foundation",
              "@language": "en"
            }
          ]
        }
      ]
    }
  ]
}

I don't think that we will be able to automatically fit the Hewlett data into this schema, so that we will have to add the entries manually. This seems to be doable given the number of probably < 200 relevant entries in the database. (Currently only 163 hits for a search for 'OER OR "open educational resources"'.

philboeselager commented 8 years ago

I don't think that we should use schema:ProgramMembership as it is intended for quite separate things (like mile programs of airlines or other loyality programs like the German "payback").

That sounds reasonable. This was just the most fitting data type I had found on schema.org.

In general, I appreciate the approach to map a grant on a more generic / general data type, but I am not sure if it should work the way described above. The example given describes an action called OER World Map (which is the name of the funded project).

Besides of that, there is a reason, why you would try to minimize encapsulation of special (e. g. grant / funding) data in your own property (called "funding" here): by using more specific data types (like e. g. ProgramMembership or better ones) you try to maximize compatibility to other ontologies already existing, right?

I've found that there are some ontologies describing grants already, but I have not found out how to interact with FRAPO for example. @literarymachine Do you know how to use that?

trugwaldsaenger commented 8 years ago

Just some thoughts about this @karindr and I just discussed:

  1. If we calculate 10 minutes to import the very basic information (name, description, URL) given online in the grantees data base manually, this would be around 25 hours of work. So if we could manage to import this information automatically with, say 5 hours, this would be close to 3 days saved. If we manage to receive even more information/fields form Hewlett, the savings would be even bigger.
  2. The main problem seems to be, that we do not know what data type the entries of the list are. Most of them are institutions, but there seem to be some services and projects included as well. I see two solutions to solve this problem: a) add the data type manually in advance b) include the "change type functionality" and import all data as institutions. We could than lateron check all entries and change its type, if necessary.
  3. Another issue might be, that we currently allow to attach funding information only to projects and not to institutions. We have to analyse this question more in detail, if @acka47 has not already done so...
literarymachine commented 8 years ago

The main problem seems to be, that we do not know what data type the entries of the list are.

From what I can see, all entries are structured in such a way that they refer to an Organization (the grantee, e.g. hbz) and an Action (i.e. Project, e.g. "For the development of an OER World Map") that is funded.

I don't think that we will be able to automatically fit the Hewlett data into this schema

Could you elaborate on why we cannot generate such a structure from the Hewlett data? Sure, it will most likely result in a couple of duplicates (e.g. for the hbz), but we could fix that afterwards.

acka47 commented 8 years ago

Could you elaborate on why we cannot generate such a structure from the Hewlett data?

We definitely can generate the funding information automatically. But if we proceed following the approach I outlined above, we will first and foremost have to identify or add the project the funding information is connected to. The problem is that the data we currently have doesn't include any structured information about the project itself, not even a name. (There is only a reference somewehere in the grant purpose.) Thus, we will at best be able to generate this semi-automatically.

Anyway, I am not sure if the approach from https://github.com/hbz/oerworldmap/issues/256#issuecomment-239824427 is the best way to go at least for the Hewlett data. I will take a deeper look at the grant purposes in a separate comment.

acka47 commented 8 years ago

Here is a randomly generated list of grant purposes from the Hewlett data:

As one can see the purpose only rarely gives a hint to a concrete project. If we don't get more detailed information, with regard to the Hewlett data I rather tend to add the funding as a relation between two organizations that might point to a concrete project that is funded (in the example with the property fundingTarget), e.g. for the example above:

{
   "@id":"urn:uuid:e08975fe-acc0-4733-8496-2cfc034a120a",
   "@type":"Organization",
   "name":[
      {
         "@value":"North Rhine-Westphalian Library Center",
         "@language":"en"
      }
   ],
   "funding":[
      {
         "amount":"$108,000",
         "awardDate":"2014-11-17",
         "duration":"17 months",
         "purpose":[
            {
               "@value":"For the development of an OER World Map",
               "@language":"en"
            }
         ],
         "fundingTarget":[
            {
               "@id":"urn:uuid:43526b28-66f8-49bb-bf1c-c3cb8d4bbbcd",
               "@type":"Action",
               "name":[
                  {
                     "@value":"OER World Map",
                     "@language":"en"
                  }
               ]
            }
         ],
         "description":[
            {
               "@value":"The North Rhine-Westphalian Library Service Center (hbz) is a central service and development organization for university libraries in North Rhine-Westphalia and has been a leader in the Open Movement. Through a Request for Proposals process, hbz submitted the strongest proposal to develop an OER World Map. This grant request builds upon a phase I effort (which funded three projects), in which hbz successfully built a prototype in a relatively short time frame. Key deliverables of this grant include: developing a functional website that builds upon existing OER mapping efforts, encouraging interoperability of mapping efforts, fostering international collaboration, and highlighting OER efforts around the world in a way that is easy for a wide variety of audiences to understand. This grant is designed to fill a long-recognized gap in the OER field.",
               "@language":"en"
            }
         ],
         "funder":[
            {
               "@id":"urn:uuid:0801e4d4-3c7e-11e5-9f0e-54ee7558c81f",
               "@type":"Organization",
               "name":[
                  {
                     "@value":"William and Flora Hewlett Foundation",
                     "@language":"en"
                  }
               ]
            }
         ]
      }
   ]
}
acka47 commented 8 years ago

Here is another example for the approach outlined in the last comment that is completely built from data from the online grants database. Example:

{
   "@id":"urn:uuid:to-be-generated",
   "@type":"Organization",
   "name":[
      {
         "@value":"Keio Research Institute",
         "@language":"en"
      }
   ],
   "location":{
      "geo":{
         "lon": 139.4273,
         "lat": 35.3882
      },
      "address":{
         "addressCountry":"JP",
         "streetAddress":"5322 Endo Fujisawa",
         "postalCode":"252-0882",
         "addressLocality":"Kanagawa"
      }
   },
   "url":"http://www.kri.sfc.keio.ac.jp/en/",
   "funding":[
      {
         "amount":"$200,000",
         "awardDate":"2010-09-22",
         "duration":"18 months",
         "purpose":[
            {
               "@value":"To deepen the value of and promote the vision of OER within Japan",
               "@language":"en"
            }
         ],
         "funder":[
            {
               "@id":"urn:uuid:0801e4d4-3c7e-11e5-9f0e-54ee7558c81f",
               "@type":"Organization",
               "name":[
                  {
                     "@value":"William and Flora Hewlett Foundation",
                     "@language":"en"
                  }
               ]
            }
         ]
      }
   ]
}

This would be no problem to generate automatically from the source data. The question is whether this really makes sense from a data modeling perspective because we shouldn't just model this to suit the first data set we import....

literarymachine commented 8 years ago

I just stumbled across the Academic Research Project Funding Ontology (ARPFO) which we could also consider.

philboeselager commented 8 years ago

I just stumbled across the Academic Research Project Funding Ontology (ARPFO) which we could also consider.

A big +1.

If we calculate 10 minutes to import the very basic information (name, description, URL) given online in the grantees data base manually, this would be around 25 hours of work. So if we could manage to import this information automatically with, say 5 hours, this would be close to 3 days saved. If we manage to receive even more information/fields form Hewlett, the savings would be even bigger.

We should implement an automatic routine anyway, for we want to be able to import more and maybe even bigger datasets in the future.

I'm thinking about a GUI-supported routine that presents each data set to be imported on a separate page where you can simply choose "Edit" or "OK" (So far as a sketch - we can talk about editing details later.) How do you think about that?

acka47 commented 8 years ago

As the duration seems to always be given in months in the Hewlett data, we could easily use http://schema.org/duration with ISO-8601 in the RDF, e.g. "duration": "P17M".

drrobertfarrow commented 8 years ago

Hi guys. Sorry I missed the original question. I don't have a list of data about Hewlett projects. Weach planned at one point to get all their data on the Impact map. There was even talk at one point of Hewlett making it mandatory for grantees.

I think the ARPFO option looks worth perusing and would also allow us to specify start and end dates of projects. Perhaps we could also published historical data about research projects.

acka47 commented 8 years ago

I tried to sort the options already brought forward on this wiki page: https://github.com/hbz/oerworldmap/wiki/Modeling-grants

Generally, we have to decide first which general approach to follow. These are the three options I came up with (there are definitely more):

drrobertfarrow commented 8 years ago

I think this is pretty interesting and a database of funding awards which could be compared with outputs or density of activity on the map is enticing. I can't access the original link though.

Of the options, I actually like @literarymachine's suggestion of having some connection between different phases of related work to show where there has been continuation funding, for instance.

Perhaps at a later date it will also be possible to connect artefacts from research projects in this way.

philboeselager commented 8 years ago

If we want to develop a new Grant vocabulatory, we maybe should base on / geared to fibo - a vocabulatory for financial industry business that has made it into schema.org 3.0 and could therefore turn out as a standard in the future.

acka47 commented 8 years ago

Based on our discussion today, I created an example expected output for OER World Map phase II at https://gist.github.com/acka47/29545e24a84dcac0fefa89bb6d47ebdc

literarymachine commented 8 years ago

While @philboeselager was completing the import script, hewlett.org was relaunched. While this does throw us back a bit, the new grant database exposes some additional properties that will make our imports better. See e.g. http://hewlett.org/grants/north-rhine-westphalian-library-service-center-hbz-for-the-development-of-an-oer-world-map/

philboeselager commented 8 years ago

Based on our discussion today, I created an example expected output for OER World Map phase II at https://gist.github.com/acka47/29545e24a84dcac0fefa89bb6d47ebdc

@acka47 Can you please redo that for http://www.hewlett.org/grants/north-rhine-westphalian-library-service-center-hbz-for-the-development-of-an-oer-world-map/

acka47 commented 8 years ago

I'd say for the concrete example not much changes. If I see this correctly, we have two or three additional elements of information:

  1. "Strategies": with value "OER" in the example
  2. "Type of Support" with value "Project" in the example
  3. An overview page of all grants for a grantee (e.g. hbz) based on a Hewlett grantee ID ("51445" in the hbz case)

1.) enables us to filter the database for grants to be imported into the OER World Map. 2.) enables us to type the thing that is funded which – in this case – is a project which already is covered in the example, see line 25f. 3.) enables us to identify grants for the same grantee. This is actual a difference to the status before. We can and should add the grantee ID to our data and can thus prevent creating duplicates from Hewlett data. I updated the example accordingly but couldn't find properties to reuse so that I used example.org namespace for now, see https://gist.github.com/acka47/29545e24a84dcac0fefa89bb6d47ebdc#file-frapo-ex1-json-L38-L39.

acka47 commented 8 years ago

There seem to be three values for "Type of support":

We will have to think about how to handle this. For "General Support/Organization" the implementation probably is straightforward. I propose just adding the organization directly as object of the frapo:funds property. Will add an example based on http://www.hewlett.org/grants/open-education-consortium-for-general-operating-support/ soon.

BTW, there is another strategy in the Hewlett database besides "OER" that might be relevant for the World Map. It is named "OE (probably for "Open Education"), see this example.

philboeselager commented 8 years ago

@acka47 : how should the strategy be defined in terms of ontology? There is no such field "strategy" or anything, as far as I've seen. (Same still for "type of support".)

acka47 commented 8 years ago

@acka47 : how should the strategy be defined in terms of ontology? There is no such field "strategy" or anything, as far as I've seen. (Same still for "type of support".)

I would not add this to our data anyway as all grants we record involve OER. I thought we could use this info to filter out all relevant grants from the Hewlett database. Unfortunately, this field isn't queried by the standard search and the data base neither has a filter for "strategy" nor an extended search function. Examples:

In other words, I don't really get how the grant search works but we are probably good to go when using the searches for "oer" and "open educational resources" as basis to extract the relevant grants.

acka47 commented 8 years ago

I now added an example based on http://www.hewlett.org/grants/open-education-consortium-for-general-operating-support/. See https://gist.github.com/acka47/a02159ef11442b79097adeffa4722702.

acka47 commented 8 years ago

I made some changes to the general structure.

  1. As project and organization fundings have a term, I added the duration to the grant, see https://gist.github.com/acka47/29545e24a84dcac0fefa89bb6d47ebdc#file-frapo-ex1-json-L20. I left it at the project as it is also useful there.
  2. As the grant number can not be found anymore in the HTML or URL, I removed it. (diff)
acka47 commented 8 years ago

I just added names to the grants in the examples, see https://gist.github.com/acka47/29545e24a84dcac0fefa89bb6d47ebdc/revisions#diff-9f4b727373bed78569fbc9311bcdcdd3 and https://gist.github.com/acka47/a02159ef11442b79097adeffa4722702/revisions#diff-ea09e8f97b7f323b39b7eca1f394b91f.

philboeselager commented 7 years ago

In other words, I don't really get how the grant search works but we are probably good to go when using the searches for "oer" and "open educational resources" as basis to extract the relevant grants.

We can use the script with any search term we like. E. g.

python import/hewlett/import.py 'http://www.hewlett.org/grants/?search=oer' 'import/hewlett/search_oer.json'
python import/hewlett/import.py 'http://www.hewlett.org/grants/?search=open+educational+resources' 'import/hewlett/search_open_educational_resources.json'

We can also import from the whole "education" programme, since grants will be filtered for "strategy == OER?"

http://www.hewlett.org/grants/?search=&search_year=&search_program=31392

We might do all three (or more) imports as well, since a UUID internal cache makes sure that import duplicates get the same ID and therefore will be unified when being fed into the OER WorldMap.

acka47 commented 7 years ago

We can also import from the whole "education" programme, since grants will be filtered for "strategy == OER?"

Yes, that is the way to go. I didn't think about filtering the data after retrieval...

philboeselager commented 7 years ago

@literarymachine came up with the idea of putting the grant as the outer Json entity and setting the hewlett foundation as the inner entity to every grant - this is just a syntactical change, while we kept the semantics. (This was due to performance reasons, as we only need to mention the Hewlett UUID.) This is the one thing why the output now looks different from the expected output described on the gist pages above.

The other thing is that up to now we had decided to only import funded actions - not funded organizations, therefore not converting "general operation support". As an example see this output example of The Rebus Foundation: "For Building A Platform To Help Faculty Publish Open Textbooks"

For now, there is one major improvement remaining:

Done just now:

acka47 commented 7 years ago

separate address region and postal code from general address lines. This might turn out to be more complicated as it looks at first sight, since there are many variations of address line combinations and many variants of address parts (postal codes of different countries, address regions of different countries and so on).

I suggest that we do this manually after import.

acka47 commented 7 years ago

Talkin to @trugwaldsaenger and @literarymachine we decided not to add grants as separate resources to the website but to embed them in the projects. This means using the project as root object in the JSON. I will adjust the example accordingly.

acka47 commented 7 years ago

Here is the updated example:

https://gist.github.com/acka47/29545e24a84dcac0fefa89bb6d47ebdc

I also removed the Hewlett grantee ID, see this diff. I think we don't actually need it. Instead, I added a link to the grant entry in the Hewlett database.

As soon as #826 is merged the complete info should be valid JSON-LD.

philboeselager commented 7 years ago

I've pushed an updated version into https://github.com/hbz/oerworldmap/tree/256-importHewlettGrant. Data extraction works pretty well now. I guess this is the way we want it. @literarymachine can you please have a look - do you miss something? (Not concerning the contents of the python file, but merely the environment.) I'm going to open a pull request.

literarymachine commented 7 years ago

@acka47: on beta, see https://beta.oerworldmap.org/resource/ (unclick "Restrict results to map"). Please note that currently navigating to the entry for The Hewlett Foundation will crash the server due to some memory issues. Individual projects that have been imported work fine, though.

acka47 commented 7 years ago

Has something changed on beta? I currently see only 67 projects, see https://beta.oerworldmap.org/resource/?q=&filter.about.%40type=Action and couldn't find one imported with Hewlett data...

literarymachine commented 7 years ago

Yes sorry, I deployed other stuff for review yesterday and had to reset the DB. Will ping you again.

literarymachine commented 7 years ago

@acka47 I re-crawled and re-ingested the data to beta, but as mentioned in https://github.com/hbz/oerworldmap/pull/931#issuecomment-268081987 some entries are currently missing. Nevertheless, you may want to take a first look to get an impression through the eyes of #955 which can (and should) be deployed before we ingest the Hewlett data anyways.

literarymachine commented 7 years ago

On https://beta.oerworldmap.org/ now. Ingested into an empty database so that only the Hewlett data is available there for a first review. Then we will have to discuss how to deploy this to production. Be warned that clicking on the Hewlett Foundation itself for some reason still leads to the application possibly crashing.

karindr commented 7 years ago

I had a look on a random basis on the data on beta, here are the results.

Things to consider that may make a new export from Hewlett necessary: • There are (mostly?) links to the organisations website and they should be imported also! E.g. for https://beta.oerworldmap.org/resource/urn:uuid:f8b72431-3def-41b9-baf5-7ddc1e5bafd4 on http://www.hewlett.org/grants/african-virtual-university-for-a-project-to-analyze-and-help-improve-the-capacity-of-seventeen-african-universities-to-develop-or-re-author-academic-content-for-use-in-the-avus-open-distance-and-e/ • For the hbz there is the new project missing. Is this a result of a fixed export date or is this an error?

Things to consider at the moment with the already imported data: • The amount of the funding is in the json but isn’t shown yet. • On production there is a field “funded by”. Shouldn’t this be displayed on beta for the Hewlett projects as well?

Things to be done after an import to production: • What to do with entries which are already in the map? Mostly there is more information on the map, so we should add the new information there. Import an than remove duplicates manually after transferring the new aspect? • There is a lot of information missing on organisations and projects: o Pin on the map (manually adjust address?) o Description (world map project has one …) o Website o Logo (add manually) o Twitter, Facebook etc. (add manually) • “Support of” in project names, delete manually? • Is there a problem with projects with the same name? E.g. “Development Of A Global Oer Map Prototype” with three different operators • Have a look which project/organization name is correct if there are similar names or which way they are connected, e.g. OER Hub/OER Research Hub (manually)

Nice to have in the future: • There seems to be a lot of projects which ended in the last years so there should be a filter for start and end date or for “ongoing”, “past” and “future” (enhancement).

literarymachine commented 7 years ago

Sometimes a single beta system is not enough. Unfortunately, I had #994 checked out which is why you only saw half of the data.

There are (mostly?) links to the organisations website and they should be imported also! E.g. for https://beta.oerworldmap.org/resource/urn:uuid:f8b72431-3def-41b9-baf5-7ddc1e5bafd4 on http://www.hewlett.org/grants/african-virtual-university-for-a-project-to-analyze-and-help-improve-the-capacity-of-seventeen-african-universities-to-develop-or-re-author-academic-content-for-use-in-the-avus-open-distance-and-e/

We have those links, but from the Project rather than the Organization. See https://beta.oerworldmap.org/resource/urn:uuid:b2d77243-2d44-4272-a1fc-69b6bf27c022

For the hbz there is the new project missing. Is this a result of a fixed export date or is this an error?

Unfortunately, the database does not make crawling very easy for us, see https://github.com/hbz/oerworldmap/pull/931#issuecomment-268513492. We will try to fix this by crawling regularly and thus catching all entries eventually.

Things to consider at the moment with the already imported data: • The amount of the funding is in the json but isn’t shown yet. • On production there is a field “funded by”. Shouldn’t this be displayed on beta for the Hewlett projects as well?

On beta now, see e.g. https://beta.oerworldmap.org/resource/urn:uuid:b2d77243-2d44-4272-a1fc-69b6bf27c022.

karindr commented 7 years ago

OK, that solves some of the points mentioned by me above.

But concerning the link to the organisations website this is still missing: I am talking about the link to for example http://www.avu.org/ from here: http://www.hewlett.org/grants/african-virtual-university-for-a-project-to-analyze-and-help-improve-the-capacity-of-seventeen-african-universities-to-develop-or-re-author-academic-content-for-use-in-the-avus-open-distance-and-e/. This link is missing and it is important for us to have it directly on the map.

literarymachine commented 7 years ago

I am talking about the link to for example http://www.avu.org/ from here: http://www.hewlett.org/grants/african-virtual-university-for-a-project-to-analyze-and-help-improve-the-capacity-of-seventeen-african-universities-to-develop-or-re-author-academic-content-for-use-in-the-avus-open-distance-and-e/. This link is missing and it is important for us to have it directly on the map.

Ah, now I understand. @philboeselager could you add this as url of Organization?