ukwa / webarchive-discovery

WARC and ARC indexing and discovery tools.
https://github.com/ukwa/webarchive-discovery/wiki
115 stars 25 forks source link

wayback_date field, set indexed="true" #186

Open thomasegense opened 6 years ago

thomasegense commented 6 years ago

wayback_date field definition in schema.xml

Setting index=true would make playback easier in SolrWayback. This is because wayback_date is used in the playback-api that all playback engines use.

....../web/20110823163324/http://www.test.uk/index.hmlt

At the same time remove the comment above the field...

anjackson commented 6 years ago

The index already stores the full date as in the crawl_date field - would it be easier to use that?

thomasegense commented 6 years ago

I know, that is what I use now :) But since we already have the field in the schema it would be nice to have it indexed/stored and using it would be much simpler without conversion many places in the solrwayback code. (both javascript and Java)

anjackson commented 6 years ago

Okay, presumably this won't bloat the index much? In which case we may as well. Individuals can always disable it locally if they wish.

thomasegense commented 6 years ago

It will really have as minimum impact as possible since there only are two values :) (well so far...) I will implement it, but will not be for the first few weeks :)

It would make me happy if you can release the 3.0 branch soon, so we have a new milestone and reference for people using the warc-indexer. Many people was interested in using the 3.0 version (Build Library Labs) and I think it is ready to get out of the SNAPSHOT label :)

anjackson commented 6 years ago

I'll work on rolling a 3.0.0 release. See https://github.com/ukwa/webarchive-discovery/milestone/6

anjackson commented 6 years ago

Release 3.0.0 is live, although I'm planning a minor update to bring some older dependencies up to date.