guardian / proposer-suggested-content

Hugo's thing
2 stars 0 forks source link

Realtime Suggestr-ing to promote in-situ linking #23

Open alastairjardine opened 8 years ago

alastairjardine commented 8 years ago

Image what would happen if we showed what was matched in realtime (rather than the post-write scenario we worked in), against the words matched.

The disconnected experience is okay, but I do think it's to the detriment of the overall linking experience and chance of linking. It also forces people to engage with the feature.

hmgibson23 commented 8 years ago

The biggest problem I can see with doing it like this is performance. We've already seen very slow performance in the client with our current approach and it will be even slower if we do it on every change. The new version sends all the text content to the service and the service does the magic ... so we could send the current text content every minute or so and display results that way. But I still it will be better to mark it up all at once rather than in intervals.

theefer commented 8 years ago

Sounds like something we could talk about with Tom from the BBC on Wednesday?

Does the magical service require the full copy to provide good results? How fast and cheap is it to run a query, ie how many could realistically be run on parallel and how often?

I imagine services based on entity extraction, such as the BBC Juicer, would typically be cheaper and still function well with document fragments (eg send the current paragraph whenever a word has been typed or a special character been encountered - think @ on Facebook or Twitter), but it'd likely not be matching concepts (eg mobile phones, snooping charter) as well as your approach.

Something worth pondering about based on what the overall aim and scope is.

On Tue, 3 Nov 2015, 09:39 Hugo Gibson notifications@github.com wrote:

The biggest problem I can see with doing it like this is performance. We've already seen very slow performance in the client with our current approach and it will be even slower if we do it on every change. The new version sends all the text content to the service and the service does the magic ... so we could send the current text content every minute or so and display results that way. But I still it will be better to mark it up all at once rather than in intervals.

— Reply to this email directly or view it on GitHub https://github.com/guardian/proposer-suggested-content/issues/23#issuecomment-153298471 .


This e-mail and all attachments are confidential and may also be privileged. If you are not the named recipient, please notify the sender and delete the e-mail and all attachments immediately. Do not disclose the contents to another person. You may not use the information for any purpose, or store, or copy, it in any way. Guardian News & Media Limited is not liable for any computer viruses or other material transmitted with or as part of this e-mail. You should employ virus checking software.

Guardian News & Media Limited is a member of Guardian Media Group plc. Registered Office: PO Box 68164, Kings Place, 90 York Way, London, N1P 2AP. Registered in England Number 908396

hmgibson23 commented 8 years ago

Definitely something to discuss with him. The service itself is pretty fast at doing all the parsing. The slowness is cause by the browser fetching the bigrams and doing tfidf analysis (although this will be removed once this gets merged). It's very fast to run a query and having parallel services trained on different datasets is something we've talked about ... this is similar to the NYT approach. i.e. one service for bigrams, one for unigrams, one of sentences etc.

I think when we've collected some user feedback we'll be able to get an idea of which services might be the most useful as well.