osirrc / jig

Jig for the Open-Source IR Replicability Challenge (OSIRRC)
13 stars 3 forks source link

Clarify jig 'search' subcommand #47

Closed lintool closed 5 years ago

lintool commented 5 years ago

Why not have a path here also?

ryan-clancy commented 5 years ago
  • Please clarify in documentation (README) --output and --qrels: relative path appears to work... I remember in a previous absolute paths were required?

The current documentation has the relative/absolute note removed, either of them work now.

  • What's the rationale for --topic being inconsistent with the two options above?

    the name (not path) of the topic file

Why not have a path here also?

Poor design choice - will create an issue to address.

  • What's the rationale for having --save_id? Presumably, search is stateless (unlike index and init) right?

--save_id allows you to have multiple images saved for different collections - if we index with wapo with anserini:latest and later nyt with anserini:latest without specifying a different save_id (which is used to compute a hash for the snapshot container's tag), nyt would overwrite wapo (which is the default behavior, if you don't change save_id from it's default value).

lintool commented 5 years ago

Sorry, clarification:

What's the rationale for having --save_id for search? Presumably, search is stateless (unlike index and init) right?

ryan-clancy commented 5 years ago

What's the rationale for having --save_id for search? Presumably, search is stateless (unlike index and init) right?

Ahh - for search we need to know which image to use. For example, I could index core17 using the -storeRawDocs and -storeTransformedDocs flags in Anserini, using a different save_id for each which would keep them both saved and available to search over (which saves re-indexing each time). When actually calling search, I could specify which one to use.

The flow is like this:

  1. init + index on anserini:latest and then save anserini:hash(latest + save_id)
  2. search on anserini:hash(latest + save_id)
lintool commented 5 years ago

Ohhhhh... I get it. The --save_id in search is the image to load. Not what to save after you do the search. Because in prepare, --save_id is what we save to, right?

See my confusion?

ryan-clancy commented 5 years ago

Yeah, I guess the naming of the parameter (or documentation) is a bit confusing. I'll make a slight change of documentation, if it's not clear we can re-name the param.

ryan-clancy commented 5 years ago

Actually it looks like it's been updated in one of the PRs... https://github.com/osirrc2019/jig/pull/46

lintool commented 5 years ago

if it's not clear we can re-name the param.

Why not? Are jig params breaking changes?

What about being explicit? e.g., --save-to-tag and --load-from-tag

ryan-clancy commented 5 years ago

if it's not clear we can re-name the param.

Why not? Are jig params breaking changes?

What about being explicit? e.g., --save-to-tag and --load-from-tag

Alright, I'll update once the current PRs are merged to avoid conflicts as they touch the same code/docs.