Open peterwebster opened 10 years ago
As @anjackson points out, we may need an upper limit on the number of resources per corpus.
I need more details about a "corpus" and how to save these documents. Thanks
A corpus is just any set of resources that the user regards as a meaningful unit of analysis, which they want to be able to return to over time to continue to query, and to add/substract resources as their thinking develops over time. Does that help ? [Another ticket coming shortly about adding and removing resources]
Created Corpus (has many) and Resource/Document model/classes and Tables with resource storing the document "id_long" value to use as the reference.
Can you please use "id" instead of "id_long" as that field may be dropped in the future.
@anjackson shall I use this "id" for the "exclusion" functionality in the search too?
Yes please. "id_long" is an artefact of an old design and will be removed from future releases of the indexer, so should not be used anywhere in Shine.
What details do you want to save besides the "resource" id? Title, URL, etc?
Current data for resources saved to a corpus.
HI @kinmanli : I think those fields for each resource are good for now. Users may over time want more, so leave that option open if possible.
@kinmanli could you remind me how the GUI currently allows users to create a corpus? That is, to get from:
to something that shows up at; http://www.webarchive.org.uk/shine/search/mycorpora
Or, is this the workflow that needs defining still?
@peterwebster you need to select a few checkboxes and choose 'add to list'. This was just an idea I came up with but need a concrete workflow to work from.
@kinmanli so, this is how I see the main workflow.
At this point, they need a means of saving all of the results remaining in their set as their corpus.
@anjackson there's a design decision in the GUI here. I would favour making a visual association between 'Save this Search' and 'Save these results as a corpus', and dissociating both from Exclude Resource and Exclude Host, which are preparatory to them. Not sure yet where the first two should go, but not alongside individual results.
I think that the present 'Add to List' option is redundant as it is - users are I think unlikely to select some individual resources from the list to make a corpus.
However we might want to replace 'Add to List' to 'Add to Corpus' - the option to add resources from the results list to existing corpuses.
Users require to 'save' a set of documents (generated by a query), attached to a user account, for later reuse as a 'corpus'.
In the short term, this can be achieved by the saving query function ( #12 ) as the index won't change.
However, the user here is interested in the list of resources, not the query itself. So: eventually this implies adding a corpus facet to Solr, and labelling each new corpus as 'corpus=PetersCorpusversion1'