Corpus creation (from a query)

peterwebster commented 10 years ago

Users require to 'save' a set of documents (generated by a query), attached to a user account, for later reuse as a 'corpus'.

In the short term, this can be achieved by the saving query function ( #12 ) as the index won't change.

However, the user here is interested in the list of resources, not the query itself. So: eventually this implies adding a corpus facet to Solr, and labelling each new corpus as 'corpus=PetersCorpusversion1'

peterwebster commented 10 years ago

As @anjackson points out, we may need an upper limit on the number of resources per corpus.

kinmanli commented 10 years ago

I need more details about a "corpus" and how to save these documents. Thanks

peterwebster commented 10 years ago

A corpus is just any set of resources that the user regards as a meaningful unit of analysis, which they want to be able to return to over time to continue to query, and to add/substract resources as their thinking develops over time. Does that help ? [Another ticket coming shortly about adding and removing resources]

kinmanli commented 10 years ago

Created Corpus (has many) and Resource/Document model/classes and Tables with resource storing the document "id_long" value to use as the reference.

anjackson commented 10 years ago

Can you please use "id" instead of "id_long" as that field may be dropped in the future.

kinmanli commented 10 years ago

@anjackson shall I use this "id" for the "exclusion" functionality in the search too?

anjackson commented 10 years ago

Yes please. "id_long" is an artefact of an old design and will be removed from future releases of the indexer, so should not be used anywhere in Shine.

kinmanli commented 10 years ago

What details do you want to save besides the "resource" id? Title, URL, etc?

kinmanli commented 9 years ago

Current data for resources saved to a corpus.

2015-03-13 12-34-07_corpus creation from a query issue 13 ukwa_shine

peterwebster commented 9 years ago

HI @kinmanli : I think those fields for each resource are good for now. Users may over time want more, so leave that option open if possible.

peterwebster commented 9 years ago

@kinmanli could you remind me how the GUI currently allows users to create a corpus? That is, to get from:

http://www.webarchive.org.uk/shine/search?page=1&invert=&facet.fields=public_suffix&invert=&invert=&invert=&invert=&invert=&invert=&action=search&query=%22goji+berry%22&tab=results&sort=content_type_norm&order=asc&excludeHost=www.clickok.co.uk&excludeHost=ukcommerceonline.co.uk&excludeHost=www.quirki.co.uk

to something that shows up at; http://www.webarchive.org.uk/shine/search/mycorpora

Or, is this the workflow that needs defining still?

kinmanli commented 9 years ago

@peterwebster you need to select a few checkboxes and choose 'add to list'. This was just an idea I came up with but need a concrete workflow to work from.

peterwebster commented 9 years ago

@kinmanli so, this is how I see the main workflow.

User fires a query
In several iterations, they change the search criteria, add and remove facet filters, and run it again.
Once they've done this, they exclude resources or hosts
They reach a decision point: "this list of search results now represents what I was looking for - I've refined my query, and excluded the noise - this is now my corpus."

At this point, they need a means of saving all of the results remaining in their set as their corpus.

@anjackson there's a design decision in the GUI here. I would favour making a visual association between 'Save this Search' and 'Save these results as a corpus', and dissociating both from Exclude Resource and Exclude Host, which are preparatory to them. Not sure yet where the first two should go, but not alongside individual results.

I think that the present 'Add to List' option is redundant as it is - users are I think unlikely to select some individual resources from the list to make a corpus.

However we might want to replace 'Add to List' to 'Add to Corpus' - the option to add resources from the results list to existing corpuses.

ukwa / shine

Corpus creation (from a query) #13