Open atz opened 7 years ago
The risk of FileSet transactionality for Hyrax IngestJob
may be limited by filesets being created per upload, but this is still an architectural flaw. During migrations, for example, a dozen add_file_to_file_set
calls could quite plausibly be entertained on the same fileset in parallel.
The Updater
classes do:
if type.instance_of? Symbol
in one place and in another:
case type ... when String
This confused sniffing leads to actual logical/behavioral differences.
Note, in certain configurations, the #read
required of the file
param is actually #read(INTEGER, STRING)
Serious copy-pasta, 90+% similarity:
Also both are classes, but each defines two other classes inside the outer one. That is something we conventionally do in module namespace, but not in classes. The service classes obscure that that they are actually mostly about
Updater
classes. That's where much of the commonality is, and that should be broken out or otherwise reused.Ideally, these should be refactored to use a common base class or establish an inheritance.
Other problems:
type
paramYARD says:
but that eventually gets passed to
type_to_uri
that can raiseArgumentError
with the message'Invalid file type. You must submit a URI or a symbol.'
String or symbol, which is it?RDF::URI
doesn't care since it just wants#to_s
, in fact always calls#to_s
on the first param received. Why do we care more that it does? (but not enough to match docs and error message?)Transactionality
In production these methods will routinely be called in parallel on the same fileset, with batches of hundreds or thousands of objects filling as many job workers as are available. So why would it work to do (in
#persist
):when the
file_set
being saved has no guarantee of not overwriting changes from the 11 other processors saving divergently modified versions of what was initially the same object? If there is already some magic in place to avoid this, I haven't found it.Keep in mind, for our generated
FileSet
class, the number of ancestors distinct from regular ruby Object ancestors,(FileSet.ancestors - Object.ancestors).count
, is 84. Would you be able to guess which of them received the.save
in question? The odds are good you would be wrong. Turns out it isActiveFedora::Associations::Builder::Orders::FixFirstLast
, which gets dynamically patched onto our model here: https://github.com/samvera/active_fedora/blob/master/lib/active_fedora/associations/builder/orders.rb#L38I can get that far, but of course it calls
super
(in what is sure to be a long chain), but which one of them maybe applies magic transactional pixiedust, I can't say, but it doesn't matter since it itself calls save again via another method it monkey-patched onto our class.