ipfs-inactive / archives

[ARCHIVED] Repo to coordinate archival efforts with IPFS
https://awesome.ipfs.io/datasets
183 stars 24 forks source link

Global Research Identification Catalogue #57

Open NDuma opened 8 years ago

NDuma commented 8 years ago

I feel like a past hobby has just lead to more treasure. GRID.ac OrgRef.org on Twitter: "... adding Grid.ac identifiers ... in Feb ..." GRID.ac_DataDownloadPage

Where would I look up how to help distribute this data in a servable format on IPFS; so, it would be called upon as a live database, updating with each new release? Format

Including links to any other morphing datasets Included References

A timeline of "Diffs" Could be played : GifDifs :+1:

Each Key would be assigned an IPFS address; would the implementation be done through a .js maintainer to update new IPFS addresses and keep a web log?

https://github.com/ipfs/faq/issues/3

Wait, they might not need to change; just additions to them ... financials / funding and imports will change.

edit : add : http://creativecommons.org/licenses/by/4.0/

IanCal commented 8 years ago

(I work on GRID, not much experience with ipfs though, only played with it a bit)

This sounds really interesting.

The identifiers themselves should be permanent, although there are two main changes that can happen:

  1. The content is changed. It still refers to the same institute, but maybe the name has changed, more metadata added, something incorrect has been fixed, etc.
  2. The content has been removed. This generally happens when the identifier should never have been created in the first place, either it's not really something that should be in the database (status changes from "active" to "obsolete") or because it's a duplicate of another record (status changes from "active" to "redirect" and has a pointer to the id that should be used instead).

The original identifier will always resolve though, and we try and keep changes in 2) to a minimum. So I guess it might make sense to have each id as a name that points to the current version of the record?

A simpler approach would be to take the JSON file in the database download (this is a full version of the whole database) and store that as a single thing.

Releases happen in blocks, named by the date they were released, if that helps (so you don't see one or two records changing a little throughout the day).

Can't think of anything else relevant to mention :) happy to try and answer questions.

Out of curiosity, how did you come across grid? Glad people are finding it!

davidar commented 8 years ago

:+1:

A simpler approach would be to take the JSON file in the database download (this is a full version of the whole database) and store that as a single thing.

Cool, we should be able to import this quite easily once IPLD lands.

CC: @jbenet @mildred

mildred commented 8 years ago

An issue I could think of is if the JSON imported is too large and not split in multiple documents with links between them. But As I just said, it's always possible to split with links.

IanCal commented 8 years ago

I'm not sure what 'too large' would be in this case, and at what point it becomes easier to split or easier to keep as one thing. The full JSON db is ~45M when decompressed and has a bit under 60k entries. This can also pretty much only grow (number of entries certainly).

The full DB in both JSON and CSV is ~10M though. I'm not sure whether it's better to store this or an uncompressed JSON dump. Thoughts?

Uncompressed and split up sounds more useful for linking and accessing individual items. A zipped block of stuff sounds more useful if you're just trying to get the data.

NDuma commented 8 years ago

@IanCal I'm not sure specifically where I came across GRID now; however, I've been noting research institutions and looking for aggregate data sets in a few genres including this one, while [albeit slowly] building my own. This speeds it up a bit.

IanCal commented 8 years ago

@NDuma Glad it's helpful :) feel free to drop me a line if you have questions/problems/comments or just want to share something built with it.