Closed tom-dea closed 1 year ago
Thanks, @tom-dea, this is a great start! I think there are two next steps:
1.) We should add test coverage to confirm that the flag does what is expected. It might be possible to copy the existing test case for handling resumption tokens (https://github.com/vufind-org/vufindharvest/blob/dev/tests/unit-tests/src/VuFindTest/OaiPmh/HarvesterTest.php#L305-L339) and adjust the expectations to confirm that the second page does NOT get requested when the flag is set.
2.) It may be worth some discussion/brainstorming about the actual name of the setting. I wonder if something more descriptive would be better, like "ignoreResumptionTokens." I think describing what the setting does, rather than what it is for, might be more clear -- "harvestTestData" could potentially mean a lot of different things, but "ignoreResumptionTokens" is very specific. I'm open to other ideas or further discussion, of course!
Thanks, @tom-dea, this is a great start! I think there are two next steps:
1.) We should add test coverage to confirm that the flag does what is expected. It might be possible to copy the existing test case for handling resumption tokens (https://github.com/vufind-org/vufindharvest/blob/dev/tests/unit-tests/src/VuFindTest/OaiPmh/HarvesterTest.php#L305-L339) and adjust the expectations to confirm that the second page does NOT get requested when the flag is set.
2.) It may be worth some discussion/brainstorming about the actual name of the setting. I wonder if something more descriptive would be better, like "ignoreResumptionTokens." I think describing what the setting does, rather than what it is for, might be more clear -- "harvestTestData" could potentially mean a lot of different things, but "ignoreResumptionTokens" is very specific. I'm open to other ideas or further discussion, of course!
1) i fully agree. will do that in one of the next steps. 2) currently the behaviour is not exactly ignoring resumptionTokens. i want to improve that, rename the flag to "ignoreResumptionTokens". good idea. a much clearer name ;-)
i will contact you or comment here as soon as i think that i am done with that
Sounds great -- thanks, @tom-dea!
i just added a test case by adaption a copy of testListRecordsWithResumption() in https://github.com/vufind-org/vufindharvest/blob/dev/tests/unit-tests/src/VuFindTest/OaiPmh/HarvesterTest.php#L305-L339 the meodld is called testListRecordsWithIgnoreResumptionTokenOption() and is just below. it passes the test run.
i am not totally sure if i understand the testing methods right.
@demiankatz just added some sentences to documentation:
; ignoreResumptionTokens may be set to true in order to harvest just the first list ; response from the OAI repository. the resumption tokens will be ignored. this option ; can be used for testing purposes. it allows to harvest smaller data sets.
Thanks again, @tom-dea! I've made some minor stylistic edits to your new comment, and I think this is looking good now. I've requested a review from @EreMaijala in case he has any concerns or alternate ideas from his work on RecordManager.
I did think of one possible scenario where something different might be needed for testing: sometimes an OAI source spits out many pages of deleted records before providing any "real" records. In that scenario, this setting would not be helpful. However, that's probably a fairly rare edge case and probably doesn't prevent this setting from being useful. I could just imagine there might be a situation where we would want a "getFirstNonDeletedRecord" setting or something like that. (I'm not asking you to change anything, just brainstorming/thinking ahead).
thank you @EreMaijala for your suggestions. this sounds reasonable to me. i will have a look at this. btw i already missed an overall counter of harvested records in the logging of vufind harvester since a long time. maybe this number could also show up in logs.
i just have a look at how to implement your idea and give feedback here...
just tried to implement the stopAfter option
Thanks, @tom-dea, this looks good to me! I just made a few small edits in commit 4f7c397 since the stopAfter property is no longer nullable, so the type can be plain int
(and I figured I'd make a note about the meaning of -1 while I was in the area).
Now the next step is to get the tests passing again -- it looks like the refactoring has broken something. Please let me know if you need any help figuring it out!
this is more a sketch. it works for me so far. i added the testing parameter to have the advantage of not having to harvest whole sets in test situations.