Open thomasegense opened 6 years ago
The index already stores the full date as in the crawl_date
field - would it be easier to use that?
I know, that is what I use now :) But since we already have the field in the schema it would be nice to have it indexed/stored and using it would be much simpler without conversion many places in the solrwayback code. (both javascript and Java)
Okay, presumably this won't bloat the index much? In which case we may as well. Individuals can always disable it locally if they wish.
It will really have as minimum impact as possible since there only are two values :) (well so far...) I will implement it, but will not be for the first few weeks :)
It would make me happy if you can release the 3.0 branch soon, so we have a new milestone and reference for people using the warc-indexer. Many people was interested in using the 3.0 version (Build Library Labs) and I think it is ready to get out of the SNAPSHOT label :)
I'll work on rolling a 3.0.0 release. See https://github.com/ukwa/webarchive-discovery/milestone/6
Release 3.0.0 is live, although I'm planning a minor update to bring some older dependencies up to date.
wayback_date field definition in schema.xml
Setting index=true would make playback easier in SolrWayback. This is because wayback_date is used in the playback-api that all playback engines use.
....../web/20110823163324/http://www.test.uk/index.hmlt
At the same time remove the comment above the field...