Closed alanking closed 3 weeks ago
if we're using meta for 'scanner'... do we want to use that name everywhere for consistency?
or do we need a better name that encapsulates its various (currently appears to be 'two' different) uses?
if we're using meta for 'scanner'... do we want to use that name everywhere for consistency?
or do we need a better name that encapsulates its various (currently appears to be 'two' different) uses?
The name of the variable does not need to match the name of the key in the meta
dict here, but I suppose I agree that it would make things more consistent. I was just matching what was already there in order to minimize the diff, but can update it. What are the "two" uses?
the two i see are 'scanner' and 'syncer'. is there a better 'composite' word? or is 'scanner' sufficient?
I see. The only thing I would be concerned about at this point is the key being used in the meta
map because that could be accessed by a custom event handler - although that seems extremely inadvisable :) - and changing that in the future would be another change to the interface. The name of the variable is less important because it is internal to each function (or at least it is now).
The term "scanner" was chosen to match the name of the class, and the name of the class was chosen, IIRC, to reflect the fact that the class is responsible for implementing a way of walking or "scanning" different storage systems. This is in reference to the filesystem scanner use case as well, but there are of course other use cases that we advertise (namely, landing zone).
I think "syncer" was used due to the filesystem "sync" referenced throughout this codebase and in our training. We have operations that are not considered "sync"s (e.g. PUT_APPEND
) so not everything being done is a sync. So, I would be more in favor of changing "syncer" to "scanner", but I'm not attached to that term, really. Open to suggestions.
So, I would be more in favor of changing "syncer" to "scanner"
I think that is correct and good. Let's make that change and see how it looks/feels.
In order to avoid ambiguity (i.e. syntax errors, and... confusion) can we differentiate the scanner
class and instantiations by name?
Here's what can happen if we name the instance the same thing as the class:
>>> class scanner:
... def __init__(self, x):
... self.x = x
... def foo(self):
... self.x = self.x + 1
...
>>> scanner = scanner(1)
>>> scanner.foo()
>>> scanner.x
2
>>> scanner.foo()
>>> scanner.x
3
>>> scanner2 = scanner(10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'scanner' object is not callable
Once the scanner
is instantiated as scanner
, you cannot instantiate any more scanner
s because the class scanner
has been shadowed by the instance named scanner
.
With that in mind, here are some suggestions...
We could also consider renaming the class, but I think we can / should do that later...
storage_scanner seems better than path_scanner b/c S3 isn't 'paths' officially.
walker and crawler are nice... the scanner is the technology/approach, and the walker/crawler is the thing doing the work.
although upon 30 more seconds of thought... i like scanner_instance
.
that way we don't invent new language/nouns... but avoid the collision.
looking real good.
Agreed. scanner_instance
is a good choice.
Confirmed that a basic S3 bucket scan worked with these changes.
I like the current commits, actually, so just added #s. Mergin
Addresses #219 Addresses #222
Tests are passing. Need to test with S3 scanner as well since there are no S3 tests in the regular suite yet.