Closed 83b09dda-53f5-4663-b44c-87fd2a9a8717 closed 6 years ago
Do you have a minimal reproducer?
I created one and updated the description. It seems to be some interaction with LLDB because I can't get it to cause an error in a plain swift script.
The problem does not occur in compiler mode or script interpreter mode. This only occurs in REPL.
Same with
model.allParameters.update(withGradients: Model.Parameters(x: Tensor(1))) { $0 -= 0.1 * $1 }
This bug clearly has the highest priority because parameter aggregate is a strict requirement. bgogul (JIRA User) Would you like to take a look? I can provide further support on this.
The first step is to figure out why the trailing closure
{ $0 -= 0.1 * $1 }
... is not getting partitioned.
One quick simple hack is to make the compiler partition even inlinable functions as long as they are not generic. This change can be added at TFUtilities.cpp:662.
Comment by Gogul Balakrishnan (JIRA)
It looks like we have a workaround for the time being. I will take a look once I wrap up the other bugs that are blocking the tutorial.
There's no workaround yet.
If you do everything in one cell, or if you do everything in one swift script, then you won't get the error!
Putting everyone in one cell would make the tutorial notebook no longer a notebook.
Does this work if you hard code "-tf-promote-global-variables" to enabled?
Comment by Gogul Balakrishnan (JIRA)
@lattner, enabling tf-promote-global-variables did not fix the issue. I will dig around a bit more.
Comment by Gogul Balakrishnan (JIRA)
Just an update based on my investigation. In the repl mode, the compiler fails to inline the call to updateParameters:
model.updateParameters(withGradients: Model.Parameters(x: Tensor(1))) { $0 -= 0.1 * $1 }
The reason is that the attempt to link the updateParameters fails. The relevant condition in the shouldInlinePredicate is the following:
if (callee.empty() && !site.getModule().linkFunction(&callee, SILModule::LinkingMode::LinkAll)
"site.getModule().linkFunction(...)" returns false for updateParameters in repl mode.
How about the following?
model.allParameters.update(withGradients: Model.Parameters(x: Tensor(1))) { $0 -= 0.1 * $1 }
Is this causing the same error?
This case above should be tested first. `updateParameters` is just a Swift-level wrapper that might obscure what's going on.
Comment by Gogul Balakrishnan (JIRA)
Yes, changing it to allParameters.update(..) is causing the same error.
I think I understand the issue now. Basically, the SIL IR is not retained after evaluating a line in repl mode. Only the lowered LLVM IR persists. That is why the deabstraction phase is not able to link the SILFunction in.
I don't think that's the case, here's an example:
2> func foo<T>(x: Tensor<T>) -> Tensor<T> {
3. return x
4. } // I hit enter here
5> foo(x: Tensor(1))
2018-08-23 16:33:19.642759: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
$R0: TensorFlow.Tensor<Double> = 1.0
This means the SIL is being preserved. Compiler cannot partition generic functions.
Comment by Gogul Balakrishnan (JIRA)
I did a bit more digging around. I don't completely understand how REPL is implemented. However, I can see that the SILModule is thrown away after generating the LLVM IR. For instance, SwiftExpressionParser.cpp:1887 calls swift:: PerformIRGeneration(), which ultimately releases the SILModule at IRGen.cpp:854.
If this is the case, we need to have a way of preserving the relevant SIL code throughout the REPL session.
Comment by Gogul Balakrishnan (JIRA)
I sent out a hacky patch-19502 to deal with this issue for the time being.
@rxwei, could it be the case that the generic case is working because foo has not #tfops, and hence, no deabstraction/partitioning is needed? Here is the SIL that was generated for foo in lldb:
// Foo<A>(x:)
sil @$S13__lldb_expr_33Foo1x10TensorFlow0D0VyxGAG_tAD013AccelerableBydE0RzlF : $@convention(thin) <T where T : AccelerableByTensorFlow> (@guaranteed Tensor<T>) -> @owned Tensor<T> {
// %0 // users: %2, %1
bb0(%0 : @guaranteed $Tensor<T>):
debug_value %0 : $Tensor<T>, let, name "x", argno 1 // id: %1
%2 = copy_value %0 : $Tensor<T> // user: %3
return %2 : $Tensor<T> // id: %3
} // end sil function '$S13__lldb_expr_33Foo1x10TensorFlow0D0VyxGAG_tAD013AccelerableBydE0RzlF'
Function not having tensor is a good point. I intended to add some tensor code there but forgot...
The following case is already being diagnosed, which is great.
4> func bar<T : Numeric>(x: Tensor<T>) -> Tensor<T> { return x + x }
error: repl.swift:4:6: error: internal error: TensorFlow graph program extraction does not work on generic functions yet
func bar<T : Numeric>(x: Tensor<T>) -> Tensor<T> { return x + x }
^
And the following case fails as expected:
4> @inlinable func bar<T : Numeric>(x: Tensor<T>) -> Tensor<T> { return x + x }
5> bar(x: Tensor(1))
Your patch makes sense and is a nice hack to unblock progress!
Additional Detail from JIRA
| | | |------------------|-----------------| |Votes | 0 | |Component/s | Swift for TensorFlow | |Labels | Bug, s4tf-iris-tutorial | |Assignee | bgogul (JIRA) | |Priority | Medium | md5: cffdda1ec2c8949bd9538c7f7ce2d691Issue Description:
Execute the following two cells in Jupyter (or paste them into an LLDB REPL):
You should get this error:
If you do everything in one cell, or if you do everything in one swift script, then you won't get the error!