ipfs / notes

IPFS Collaborative Notebook for Research
MIT License
400 stars 33 forks source link

The Memex #149

Open davidar opened 7 years ago

davidar commented 7 years ago

The memex (a portmanteau of "memory" and "index") is the name of the hypothetical proto-hypertext system that Vannevar Bush described in his 1945 The Atlantic Monthly article "As We May Think". Bush envisioned the memex as a device in which individuals would compress and store all of their books, records, and communications, "mechanized so that it may be consulted with exceeding speed and flexibility." The memex would provide an "enlarged intimate supplement to one's memory". The concept of the memex influenced the development of early hypertext systems (eventually leading to the creation of the World Wide Web) and personal knowledge base software.

In "As We May Think", Bush describes a memex as an electromechanical device enabling individuals to develop and read a large self-contained research library, create and follow associative trails of links and personal annotations, and recall these trails at any time to share them with other researchers. This device would closely mimic the associative processes of the human mind, but it would be gifted with permanent recollection. As Bush writes, "Thus science may implement the ways in which man produces, stores, and consults the record of the race".

--- https://en.wikipedia.org/wiki/Memex

I've had a few discussions with people about using IPFS as a "personal library", but can't find much in the way of public discussion about it, hence this issue.

In particular, users would be provided with a ":pushpin: Pin It" button (a la pinterest) that would pin a document to a user's local ipfs node, along with providing some system for synchronising pins across all of a user's devices. There could also be shared collections that are synchronised between multiple users, etc.

The :pushpin: button could be embedded directly into webpages, be added to the browser toolbar, added as an entry to the contextual menus of file browsers, etc. Ideally it would also show the (estimated) number of ipfs nodes seeding the content, much like GitHub stars (but better, because in addition to showing support for content, you're also helping to host and distribute it ;).

We could also integrate it with other systems, eg. @hypothesis for annotations (ipfs/archives#34), @mediachain for media resolution, etc.

CC: @jbenet @edsilv @aeschylus @JesseWeinstein @mekarpeles

mekarpeles commented 7 years ago

i.e. torrent-ifying arbitrary pieces of content? Where "arbitrary" might be informed by the Open Annotations specification? (Ignore the technical incorrectness of the claim and consider only the analogy in its accessibility to the layperson)

davidar commented 7 years ago

@mekarpeles Anything that can be added to IPFS, yes :)

edsilv commented 7 years ago

along with providing some system for synchronising pins across all of a user's devices

Would something like https://onename.com work for identity?

edsilv commented 7 years ago

It might be good to be able to create "associative trails" as per Bush's original vision too. Linking items together to form narratives. Bush's trails seem to be linear-only, but I've been thinking about ways to create non-linear narratives using IPLD:

https://github.com/idocframework/specs

mekarpeles commented 7 years ago

Bush's "associative trails" are one of my ultimate goals (in life) to support. Books are essentially associative trails... but they are too rigid in their creation and presentation to benefit from the rapid-paced, technology-augmented evolutionary / recapitulation processes "Wikipedia" style crowd-sourced platforms enable.

One area where this could be applicable is in my embarrassingly idea-stage concept for http://dissertate.org -- a wiki for quickly, collaboratively publishing directed sequences of themed academic papers towards a goal. Think coursera's "tracks" but imagine that these tracks can branch arbitrarily. Thus (Randy Pausch style "head-fake") we're really talking about step towards creating a dependency knowledge graph. The goal is saving researchers time by being able to search sequences of papers (the result of tens of hours of an expert reading papers, filtering the relevant and high quality works, and pruning/publishing a "shortest path" to knowledge.

e.g. if I am reading one of Andrew Ng's papers on CNNs for voice recognition, that same paper may occur in a sequence of "how to tune a CNNs".

arbital.com, metacademy, khan academy, and other services have similarly acknlowledge this philosophy in their design and interface choices, but it's challenging to build momentum on such fronts, as (even if we can get past the technical requirements), it requires (the right) curators / scholars to participate + seed value and expertise. And presently there isn't a whole lot of incentive for this -- academics have the option to get paid and receive sole recognition for monopolizing the paper format and silo'ing + publishing their ideas and insights. Such a format also allows them to remain more focused and avoid the distracting element of others changing/interrupting their work. This suggests that a successful mode might need to be more like git in its interface, to enable collaborators to also maintain personal / separate branches / trains of thought... without preventing pull only-access / cross-pollination of ideas. Needless to say, now, in this theoretical landscape of adoption, we have a tangled mess of tangential features complicating adoption.

My thought on such things is, it's less about the technology and mechanism (pinning, etc) and more about the content: the ability to focus on a valuable domain and get buy-in from scholarly experts who can seed the right value. e.g. the phenomenon where 50% of all edits on Wikipedia (~circa 2006) done by 524 people (~.7% of users) http://www.aaronsw.com/weblog/whowriteswikipedia. Also reflected by the "90-9-1 Rule for Participation Inequality in Social Media and Online Communities" (inverted pyramid) https://www.nngroup.com/articles/participation-inequality.

@aeschylus are attempting a project (books.archivelab.org for greco-roman classics) to start with (a) content + data, (b) a community of experts, and (c) minimal interfaces (with minimal technological investment) to enable contributions. The emphasis is on exploring the best ways to organize ~100 greek classics (which version/translation of a book). Once we have enough overlapping sequence of books, we can enable some sort of "Choose Your Own Adventure" (For those who remember the Goosebumps series where you could switch between story lines/trees: https://images-na.ssl-images-amazon.com/images/I/61T0AHDQ33L._AC_UL320_SR218,320_.jpg).

The long term goal is to enable people to create directed sequences not only out of books, but out of parts of books (cc: @edsilv) using the Internet Archive's collection of millions of IIIF backed books.

I'd look forward to experimenting with this "pin" feature on such a collection, given the community, schema, and data is ready for it.

edsilv commented 7 years ago

@mekarpeles It sounds like we have similar ideas but are coming at the problem from different directions. I think if there were a simple standardised way to create decentralised n-linked lists that refer to one or more IPFS objects with additional metadata we can achieve these linear and non-linear "trails". I'd really appreciate it if you could have a read of my 'IDF' (bad name I know...) proposal and let me know your thoughts. I wrote it after going to the i-docs conference in Bristol this year. The idea of interoperable narratives occurred to me while watching a panel discussion between two i-doc authoring tool vendors with different data models. "Where's the IIIF for interactive documentaries?" The documentary makers I spoke to seemed to like the idea, but if we can make it as generic as possible I'm sure it would have a wide variety of applications.

aeschylus commented 7 years ago

Some more context: https://books.google.com/books/about/From_Memex_to_Hypertext.html?id=oZNQAAAAMAAJ

http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/ (1945)

edsilv commented 7 years ago

This thread may also be of interest to @trentmc. We chatted briefly at the DWeb Summit about this 'IDF' idea and how the author properties could make use of the ongoing work of the COALA IP group: http://coala.global/working-groups/

davidar commented 7 years ago

I knew I'd forgotten something: https://read-write-web.org/

jbenet commented 7 years ago

As i read every comment here, my reactions are:

jbenet commented 7 years ago

I think we have all been greatly inspired and want to build the various pieces of the memex. I (naturally) think IPFS is perfectly suited for this, particularly for establishing the public record. But it's going to take a lot more too-- lots of UX tooling / applications on top. we can do it!

Btw, i'm reminded of https://WebRecorder.io

jarmitage commented 7 years ago

Randomly dropping in out of nowhere to say I love that this is getting discussed

ghost commented 7 years ago

This thread may also be of interest to @trentmc ... author properties could make use [of] COALA IP work: http://coala.global/working-groups/

Thanks for the ping, Ed -- definitely interested :)

Correct, you might be able to leverage the COALA IP protocol here to save much effort. About the protocol: it starts with LCC, a protocol designed to generalize rights expression & attribution across different media verticals such as DDEX (widely used in music) or PLUS (widely used in photos). COALA IP protocol is an opinionated implementation of LCC: it uses JSON-LD & IPLD for base data structure, schema.org for identifier naming, and Interledger protocol to connect across networks (IPFS, IPDB, more). It's been an effort by the COALA IP working group, with much input from the broader community.

I've attached a couple slide decks that provide an intro. Short = 5 min intro, long = all the details. Please let me know if you have more Q's.

COALA IP - short.pdf COALA IP - long pdf.pdf

edsilv commented 7 years ago

@ascribe0

invent as little as possible, reuse well-considered building blocks

Would be a sensible mantra to adopt for a Memex project too I think.

I've been looking at the Web Annotation Data Model.

It looks like we could potentially model bookmarks and trails with this. The creator properties could use COALA IP: Party I think?

"creator": "fs:/ipfs/QmParty"

(We'd need to adhere to using IRIs as per the spec).

Combining JSON-LD with IPFS has some snags:

https://docs.google.com/document/d/1LsQCB2aznlFduE549AfIYR6Tbm95t2VmrjEiFnOhqVM/edit?usp=sharing

I noticed in your example JSON objects you aren't using an id property.

Maybe we could also use a COALA IP: Right hash/IRI for these too:

https://www.w3.org/TR/2016/CR-annotation-model-20160705/#rights-information

aeschylus commented 7 years ago

I highly recommend reading Ed's docs on the OA specs. Pretty much every memex feature can be modelled in a reasonable (meaning non-semantic-web-fascist) way by OA (except for dereferencable IRI/URIs).

aeschylus commented 7 years ago

Now that JSIPFS and soon IPLD are minted, what is a reasonable first step in the right direction? Ultimately I could see this replacing my web browser. I would much prefer to live life without a web browser.

trentmc commented 7 years ago

This thread may also be of interest to @trentmc

[@ascribe0 wrote ..]

BTW I meant to respond as trentmc, not as ascribe0. Whups:)

Ed, re your Q's on Coala: @TimDaub can give you precise answers. Tim?

@sohkai @vrde @gmcmullen

TimDaub commented 7 years ago

It looks like we could potentially model bookmarks and trails with this. The creator properties could use COALA IP: Party I think?

I like that you've used the COALA IP Party model here. The problem is, we ourselves are currently looking for better ideas on how to represent (decentralized?) on the web.

Combining JSON-LD with IPFS has some snags:

Yes. We've not been using IRIs in the COALA IP spec, but multiaddr (maybe @jbenet wants to give feedback on that). I'm not sure if it even makes sense to try to stay compliant to existing semantic web/linked data standards. By making data content-addressable and by following the IPLD specs both in regards to "merkle-links" and "merkle-paths" only links "back in time" can be established in an ontology. Another implication is that data is immutable. Semantic web/linked data and all given ecosystem around it assumes though that data is mutable, so large parts of ontologies and software would have to be rewritten/adjusted anyways.

Off-topic: Did anyone ever experiment with signed IPLD objects. This would be something we'd need here for the implementation of the COALA IP spec.

I noticed in your example JSON objects you aren't using an id property.

Don't know what you mean. The id is there, you can simply not see it because your head is not able to hash the payload :laughing:. A machine can easily though :)

https://docs.google.com/document/d/1LsQCB2aznlFduE549AfIYR6Tbm95t2VmrjEiFnOhqVM/edit?usp=sharing

This won't work. It's close to impossible to find a hash of an object that contains that exact hash.

edsilv commented 7 years ago

I'm not sure if it even makes sense to try to stay compliant to existing semantic web/linked data standards.

Yep, this has been a struggle for me to get my head around too. In that Google doc above:

Are we hashing the entire web? Or can we tolerate “polluting” our IPFS objects with http IRIs?

I'm reminded of this quote:

The problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.

After a recent IIIF conference where mediachain was demoed, a colleague commented to me that IPFS is "forking the web". I think this is a source of confusion for many right now - myself included :smile:

The id is there, you can simply not see it because your head is not able to hash the payload :laughing:. A machine can easily though :)

Your COALA IP json examples in the PDF don't have an id property. It sounds like you're saying they're unnecessary because they're implicit? The id property is however required by the Web Annotation Data Model (I'm going to use WADM from now on :smile:) Although my sense is that we'd probably have to forgo spec compliance if we're going the IPFS route because of the "jungle problem".

Would it make sense to try remodelling the WADM in IPLD?

davidar commented 7 years ago

Now that JSIPFS and soon IPLD are minted, what is a reasonable first step in the right direction?

@diasdavid Any thoughts?

Edit: also cc @mildred @nicola re IPLD

TimDaub commented 7 years ago

Your COALA IP json examples in the PDF don't have an id property. It sounds like you're saying they're unnecessary because they're implicit?

That's exactly my point. The id = IPLD.marshal(content). It's implicit and it's there. Additionally, you cannot include it, since finding the hash of an object that contains that hash is impossible.

The id property is however required by the Web Annotation Data Model (I'm going to use WADM from now on ) Although my sense is that we'd probably have to forgo spec compliance if we're going the IPFS route because of the "jungle problem".

I think you've hit the nail on the head here. IPFS is forking the web. If you're compliant to IPFS/IPLD/multiaddr/..., your chance of being non compliant with existing web standards is pretty high because the assumptions of the web are vastly different from those IPFS has.

Would it make sense to try remodelling the WADM in IPLD?

I never looked into WADM, but if it has the assumption that data can for example:

then it's very likely that you'll have to remodel.

Maybe some of the leading IPFS visionaries can comment on my thoughts here. I think what I'm describing is definitely a problem that needs to be addressed structurally by IPFS, since I think it holds back implementations from third-parties.

jbenet commented 7 years ago

a colleague commented to me that IPFS is "forking the web". I think this is a source of confusion for many right now - myself included

@edsilv It would be useful to discuss the problems highlighted. This statement is useless without content. we are explicitly NOT forking the web. The entire point is NOT to fork the web, to rebase the existing web and enable everything (or as much as we can) to work as is. You see current webapps working on IPFS without modification. The point is to keep ONE web.

If you're compliant to IPFS/IPLD/multiaddr/..., your chance of being non compliant with existing web standards is pretty high because the assumptions of the web are vastly different from those IPFS has.

@TimDaub What do you mean by this? Non-compliant with what standards? You can typically use whatever you have originally and just use IPFS for caching, redundancy, and distribution. But, if you want to take advantage of the nicer features (like dropping down to IPLD), then yeah you may have to adjust to that layer.

This means that a user can continue to use their webapp today unmodified save for serving static content on IPFS. that's fine and works just great with stock web browsers. nothing modified. For more advanced features to be used, more advanced implementations may be needed (potentially modifications, yes). This is just like what happens in the web today with regular standards: they need to be implemented and deployed. You can use some of them as polyfill, and some of them you have to change things to take advantage of them. And we're doing that.

In general, it's pretty frustrating to hear "they're forking the web" when the entire point is NOT to, and we're working very hard to make everything work even in stock browsers of today (js-ipfs). Once we have all that working very well, we'll submit patches to browsers to add support natively. One Web.

I never looked into WADM, but if it has the assumption that data can for example:

  • be mutated; or
  • linked bidirectional then it's very likely that you'll have to remodel.

No, they dont. they can be imported as they are, using a data structure that models a version history on a graph. this means you can wrap your bidirectional graph into something like an op-log that "builds up" your graph. Same for mutations. You need a notion of a time axis to make the web archivable, and that's what hash-linking gives you.

Think about what Git did for file systems. You can still model a filesystem exactly the same, and even use it. there's "mounted" versions of Git, that commit on every write and implement the fs syscalls giving you perfectly compliant filesystem.

Like has been discussed in these repos many times, for RDF and other bidirectional graph structures can be trivially modeled on top of IPFS without modification. It needs a "middleware" data structure, or maybe just transformations, between the raw IPLD and the raw RDF. cc @nicola who is working on something like this.

Think about how you would represent RDF graphs in git. You wouldnt necessarly use git links to replace the RDF links. and neither should you when adding raw, existing RDF to ipfs.

jbenet commented 7 years ago

@TimDaub

We've not been using IRIs in the COALA IP spec, but multiaddr (maybe @jbenet wants to give feedback on that).

Without having looked at the specifics, that sounds good to me. Btw, you can always "IRIfy" a multiaddr by adding fs: in the front. The goal of the fs: scheme is to proper-IRIfy "fs IRIs". (the scheme identifier broke compatibility with filesystems, and we want that back, hence not using ipfs:/<hash> but instead use /ipfs/<hash> and define fs:/ipfs/<hash> for things that need IRIs.

jbenet commented 7 years ago

@aeschylus @davidar

Now that JSIPFS and soon IPLD are minted, what is a reasonable first step in the right direction?

First step to what-- the Memex? or The Memex using OA?

jbenet commented 7 years ago

@nicola this thread, and the conversations we had at DWS and MIT make me think we need to accelerate work on "model your stock RDF on IPLD". I saw you're already on it (starting threads). Need any help?

jbenet commented 7 years ago

One general note for everyone: IPLD and IPFS both give you the ability to implement whatever you want on top of them, including exactly the same files and data streams you have today. One example is unixfs which IPFS uses to represent traditional POSIX files. It just depends how deep you want to go with the hash-linking, whether you want to use IPFS as just a transport of bytes (totally fine) or re-model your data with distributed authenticated data structures in mind (harder but way more powerful for you).

Getting more powers does not remove your old powers, just drop back up to the familiar POSIX file world if it's annoying to change anything. Many applications don't need more than authenticated file distribution.

It's like HTTP actually. HTTP allowed you to move the same files FTP did (TeX, ps, whatever). But it ALSO had native support for more interesting, linked files: HTML documents. This is strictly an upgrade. IPFS does the same, you can use it as a dumb file store, only using it to distribute and authenticate regular POSIX files. Or you can remodel with IPLD and leverage more power. It's up to you.

jbenet commented 7 years ago

Oh and on Mutability, you can add mutable links in IPLD if you want. Your applications can parse them and use them, they just wont resolve natively by IPFS just yet.

davidar commented 7 years ago

It seems there's two main issues here: the data storage aspect ("pin it", synchronisation, sharing, etc), and the annotation/metadata/interlinking aspect. Is that right?

If so, it might be helpful to split them into two (or more) separate issues, preferably with concrete goals.

edsilv commented 7 years ago

@davidar - that's my interpretation. I 100% understand the data storage benefits.

we need to accelerate work on "model your stock RDF on IPLD"

This sounds promising - something like a beginner's guide would certainly help me :-) It might also give me the means to mount a reasonable defense of the interlinking aspects of IPFS as an upgrade as opposed to fork of the web (as some may perceive it).

nicola commented 7 years ago

Dropping a few links here:

There is a lot happening in the IPLD space! (here is a complete summary: https://github.com/ipld/specs/issues/13). The relevant to this conversation are the following:

There is an effort to bring simple rdf graphs to IPLD: https://github.com/ipfs/notes/issues/152 (which what really means is to authenticate RDF graphs)

If you really want to shape IPLD, I really want you in the conversation, I set up a call for next week (here more info https://github.com/ipfs/pm/issues/124) and a rewrite of the IPLD spec (here more info: https://github.com/ipld/specs/pull/12)

edsilv commented 7 years ago

@nicola Hi, just looking for a link for the IPLD call today. Reading your PM docs:

If you are interested in watching, but do not plan on participating, please use the stream link provided by the discussion lead and watch on YouTube. Some participants may be on low bandwidth connections, and quality is generally better with less participants.

edsilv commented 7 years ago

Here's a proposal for a method to create linear and nonlinear narrative trails using the Web Annotation Data Model:

https://github.com/edsilv/trails