Closed ahawkins closed 12 years ago
I've done more digging into this issue and have discovered that there are 3 real issues at play here. They are all triggered by having multiple pipelines in a project (or input blocks in the Assetfile
).
The major issue is stated in the original post in this thread. This is a two step fix. Initially it may seem trivial to simply reset @@tmp_id
with every call to invoke_clean
. This doesn't work because invoke_clean
may be called multiple in the compilation of one project. Declaring an input
block in the Assetfile
creates a new pipeline in the project. Project#invoke_clean
loops over all the pipelines and calls invoke_clean
. This is a problem because temporary directories are not unique to each pipeline. This problem can be fixed by one refactoring: move temporary directory generation to pipeline instance. Here is the problem method: https://github.com/livingsocial/rake-pipeline/blob/master/lib/rake-pipeline.rb#L389. This creates a secondary problem. Temporary directory generation must be unique to the pipeline and across processes.. This was previously accomplished by keeping @@tmp_id
in a class variable and incrementing it at every call. The solution here is to introduce a "fingerprint" for each pipeline in the project. A hash of all the input files can be used as a fingerprint. @@tmp_id
should become @tmp_id
and be incremented on every call as normal. @tmp_id
should be reset back to 0 when invoke clean is called. This ensures that every pipeline will generate the exact same tmp directory progression on every invocation in every process. Here is a patch:
def fingerprint
@fingerprint ||= Digest::SHA1.hexdigest(eligible_input_files.collect(&:fullpath).join("-"))[0..7]
end
# Generate a new temporary directory name.
#
# @return [String] a unique temporary directory name
def generate_tmpname
"pipeline-tmp-#{fingerprint}-#{@tmp_id += 1}"
end
Hash generation may be slow, but it will work as an initial case and patch. This method should be reworked to be faster.
Solving problem #1 means the super
call in DynamicFileTask#needed?
will return true when it should (detailed in the initial post). We must know turn our focus to the #needed?
method. https://github.com/livingsocial/rake-pipeline/blob/master/lib/rake-pipeline/dynamic_file_task.rb#L66. There is a logic error in this method as it is currently. First of all, not all tasks will actually be dynamic. For example: a concat is registered as a dynamic task but it has no dynamic dependencies. This implementation detail cascades down through this method. I proposed a new method: dynamic?
Its purpose is to return true if the associated filter actually has dynamic dependencies. The DynamicFileTask
currently only checks for the dynamic
block to see if it's actually dynamic. This does not work. Every single task registers a dynamic block. This behavior is seen here: https://github.com/livingsocial/rake-pipeline/blob/master/lib/rake-pipeline/filter.rb#L197. This will trigger logic branches inside the DynamicFileTask
class when they should not be. concat or copy filters have no dependencies. Here's the new dynamic?
method.
def dynamic?
has_dynamic_block? && !invoke_dynamic_block.empty?
end
The #needed?
method should be refactored as such:
def needed?
# tasks without dynamic deps are just regular file tasks. Act accordingly.
return super unless dynamic?
# if we have no manifest, this file task is needed
return true !last_manifest_entry
# If any of this task's dynamic dependencies have changed,
# this file task is needed
last_manifest_entry.deps.each do |dep, time|
if File.mtime(dep) > time
return true
end
end
# Otherwise, it's not needed
false
end
This solves the logic errors. At this point #needed?
is working almost correctly. The 3rd issue is related to manifest generation.
There is an issue with manifest and multiple pipelines. The problem is here: https://github.com/livingsocial/rake-pipeline/blob/master/lib/rake-pipeline/project.rb#L130. The manifest is actually written at the end of all pipelines. This means that calls to last_manifest_entry
inside DynamicFileTask#needed?
will return false in cases with multiple pipelines. The manifest needs to be moved to be pipeline specific and written at the end of the invoke
method. The manifest file should be written with the same fingerprint. I will demonstrate this problem through the Iridium Assetfile
found here: https://github.com/radiumsoftware/iridium/blob/master/lib/iridium/Assetfile. The last step in entire pipeline is copy everything. There are previous pipelines with dynamic dependencies (sass). Invoking the pipeline once will generate one manifest.json
. The contents correspond (incorrectly IMO) to the all the inputs in the final pipeline. This is all the optimized assets in iridium's case. Invoking the pipeline again creates a problem. When the sass filter hits it correctly and attemps to read the manifest. However the manifest is incorrect because it was generated from a different pipeline. I have not looked into exactly how to do this refactoring yet.
All in all, all the issues can be resolved by moving genration of temporary directory to the instance level, fixing #needed?
with #dynamic?
as described above, and making manifests a pipeline concern instead of a project concern. I think that implementing these three steps will make the pipeline work as many people are expecting.
@wycats there is your epic update.
@twinturbo should this be closed since #99 was merged?
@wagenet ya you can close this
@twinturbo I can't close it. Don't have admin :( Can't you close tickets you made?
@wagenet I totally forgot that I made this ticket. I got used to just asking you to close it. Anyways #117 closes this.
I've been chasing down an issue related to "caching" in my application's pipeline. I think I've finally found the problem. I'm assuming this is true: I build the pipeline. I don't change anything. Building the pipeline again should "skip" everything since no input files have changed. IE: only rebuild the pipeline if input files have changed.
This behavior is accomplished through
Rake::FileTask
s. Each filter in the pipeline is a rake rask that setups up a task with preqreqs (the input files).Rake::FileTask
decides if a task should be invoked by looking at itself and it's preqreqs. If anyone of those need to be invoked it will be done.Rake::FileTask
simply checks File.mtime to see if a file as changed. It compares the preqreq timestamp to the file itself. If the prereq is newer than the file itself then the task needs to be invoked again. This is the call tosuper
in theRake::Pipeline::DynamicFileTask#needed?
. The class also does it's own logic but that's unimportant because it will return ifsuper
is true.Internally all files in the build process are copied into a tmp directory. This make sense. However there is a problem. The tmp directory is unique per process. The variable is incremented automatically every time the method is called. Here's the link to the method: https://github.com/livingsocial/rake-pipeline/blob/master/lib/rake-pipeline.rb#L389. This means that isolated builds of the same pipeline (inputs and assetfile) will generate the same number of temp directories. The intermediate build files should be the same and be skipped accordingly. The problem occurs when multiple build occur from the same process. This is the exact use case for the preview server--however I'm not sure if this the intended use case.
Building the pipeline multiple times in the same process will continually increment the class variable. The second time the pipeline is invoked, files will be placed into new temporary directories. The dependencies are also wired up. However since a new file is created every time, File.mtime will always be newer than before causing rake tasks to be invoked when they shouldn't. I think this is a major problem.
Here is my debugging session from deep inside rake to find out what bit of code is actually making that decision. It's from inside this method: https://github.com/jimweirich/rake/blob/master/lib/rake/file_task.rb#L32
Here is the assetfile used in my project: https://github.com/radiumsoftware/iridium/blob/master/lib/iridium/Assetfile
@dudleyf @wycats can you confirm this is a bug?