influencemapping / whos-got-dirt

Working notes for the Influence Mapper data search and enrichment API
MIT License
8 stars 0 forks source link

Explore options for decentralised architecture #4

Open jmatsushita opened 9 years ago

jmatsushita commented 9 years ago

@smari @blaine @pudo @jpmckinney

There is a opportunity to wrap this API in a DHT style architecture in order to offer additional properties. I think (but I don't think others agree) that this could be done without losing any features.

Some of the problems that this would solve is:

Some of the things this might offer for free is:

It should be possible to:

Please let me know if in fact this would remove functionalities to Who's Got Dirt?

Would you support (and switch) to such an implementation if it was developed or are there reasons not to adopt it?

jpmckinney commented 9 years ago

I prefer a solution that wraps the API (such a solution would be reusable for other APIs as well), but if the API needs to change in order for this to work, I'm open to that as well.

pudo commented 9 years ago

I think this issue raises the need to clarify two sets of user needs regarding WGD:

1) The need for service discovery. WGD was imagined as a set of relatively curated services, more along the lines of a ISP peering mechanism than a fully distributed search engine. In this scenario, people who offer search interfaces (i.e. WGD clients) would select the sources which they want to include in their searches. Given the trade-off in complexity between running the whole system over HTTP vs. a non-standard DHT-based protocol, I believe a json file in this repo would probably do just fine as an index.

2) The need for anonymization and data protection. I can't claim that I properly understand all the possible cases in this scenario, but it seems fundamentally hard to me to design an information sharing system which will not share information. It is certainly not a concern for any of the open data services involved in the current discussion around WGD. In any case, Some compromise will need to be found between the need to a) protect the source documents/databases, b) the queries, c) other transactional information. In this scenario, sharing the queries seems much more reasonable than sharing the index of source material, if only for the difference in size.

Finally, if there is indeed a use case for a fully-decentralised search engine in this space, wouldn't that rather build on top of http://yacy.net/, rather than Tox?

miguelpaz commented 9 years ago

Agree on user case researxh and definition: I know some spoke about the importance of anonimity however and if I understand correctly their point of view, believe it is overrated for this project. When you use tripadvisor or kayak you dont care about anonimity more than you do when using google advanced search. And if you do, why not use Tor? Things I did not understand well: the need of DHT-based protocol. Also I am not sure that a Journo will be ok with someone poking into his laptop to check on files on a peer 2 peer model.

El sep 22, 2015, a las 5:07 AM, Friedrich Lindenberg notifications@github.com escribió:

I think this issue raises the need to clarify two sets of user needs regarding WGD:

1) The need for service discovery. WGD was imagined as a set of relatively curated services, more along the lines of a ISP peering mechanism than a fully distributed search engine. In this scenario, people who offer search interfaces (i.e. WGD clients) would select the sources which they want to include in their searches. Given the trade-off in complexity between running the whole system over HTTP vs. a non-standard DHT-based protocol, I believe a json file in this repo would probably do just fine as an index.

2) The need for anonymization and data protection. I can't claim that I properly understand all the possible cases in this scenario, but it seems fundamentally hard to me to design an information sharing system which will not share information. It is certainly not a concern for any of the open data services involved in the current discussion around WGD. In any case, Some compromise will need to be found between the need to a) protect the source documents/databases, b) the queries, c) other transactional information. In this scenario, sharing the queries seems much more reasonable than sharing the index of source material, if only for the difference in size.

Finally, if there is indeed a use case for a fully-decentralised search engine in this space, wouldn't that rather build on top of http://yacy.net/, rather than Tox?

— Reply to this email directly or view it on GitHub.