Closed michaelwechner closed 9 months ago
Or should I switch to Java 18 as used in the docker file?
Hi Michael,
I'd suggest switching to Java 18. Although Java versions are backwards-compatible, I'm not sure the same refers to the Docker images.
Another option, if the one above doesn't help, is removing all the stuff in the docker file which is not directly related to building the image. If the problem still persists, I could try to build the image based on your parameters for JDK etc. on my Windows instance and see if it works. It could be Mac OS - specific. I built all of my images on my private Windows system and then uploaded them to the VM in the cloud so I'm not sure this Docker file works fine for all other OSs.
Best regards, Taras
On Tue, Sep 12, 2023 at 12:20 AM Michael Wechner @.***> wrote:
Or should I switch to Java 18 as used in the docker file?
— Reply to this email directly, view it on GitHub https://github.com/partarstu/transformers-in-java/issues/1#issuecomment-1714662716, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQACPRIU4FU2OIHZR3S2NKDXZ6FEJANCNFSM6AAAAAA4T6XYKA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Mit freundlichen Grüßen, Taras Paruta
Hi Taras
Thanks for your quick feedback!
I have switched to Java 18, but receive the same error, whereas see below.
I will try to tomorrow to remove stuff from the docker file as you suggest and check whether this will work :-)
Thanks
Michael
[INFO] --- exec-maven-plugin:3.0.0:exec (docker-build) @ core ---
[2/8] COPY target/base_ai_core-1.0-SNAPSHOT-shaded.jar /base_ai_core/:
dockerfile_mlm:4
2 |
3 | # copy the packaged jar file into our docker image 4 | >>> COPY target/base_ai_core-1.0-SNAPSHOT-shaded.jar /base_ai_core/ 5 |
6 | RUN apt-get --assume-yes update; apt-get --assume-yes install nano; apt-get --assume-yes install systemd;ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref b71333e2-50ef-42d4-88af-8a6142b4eace::neukl8sec3ecnr8sybahygvkh: failed to walk /var/lib/docker/tmp/buildkit-mount2584341619/target: lstat /var/lib/docker/tmp/buildkit-mount2584341619/target: no such file or directory [ERROR] Command execution failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) at org.apache.commons.exec.DefaultExecutor.executeInternal (DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.execute (DefaultExecutor.java:166) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:982) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:929) at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:457) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137) at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:370) at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:351) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:171) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:163) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:294) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:960) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293) at org.apache.maven.cli.MavenCli.main (MavenCli.java:196) at jdk.internal.reflect.DirectMethodHandleAccessor.invoke (DirectMethodHandleAccessor.java:104) at java.lang.reflect.Method.invoke (Method.java:578) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406) at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347) [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Transformers in Java 0.0.1-SNAPSHOT: [INFO] [INFO] Transformers in Java ............................... SUCCESS [ 0.122 s] [INFO] Core functionality ................................. FAILURE [02:01 min] [INFO] Samples of training the models ..................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:01 min [INFO] Finished at: 2023-09-12T00:25:11+02:00 [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:3.0.0:exec (docker-build) on project core: Command execution failed.: Process exited with an error: 1 (Exit value: 1) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn
-rf :core
btw, I just noticed, that the pom does require Java 20, whereas I have switched it to Java 18
git diff pom.xml diff --git a/pom.xml b/pom.xml index 3c70dd9..a91c41f 100644 --- a/pom.xml +++ b/pom.xml @@ -32,7 +32,7 @@
Also I have noticed that in the docker file the following lib should get copied
COPY target/base_ai_core-1.0-SNAPSHOT-shaded.jar /base_ai_core/
but in the target directory this lib does not exist
ls core/target/ classes core-0.0.1-SNAPSHOT.jar maven-archiver core-0.0.1-SNAPSHOT-shaded.jar generated-sources maven-status
Hi Michael, the root cause of the issue, as you suggested, is the wrong name of the source JAR file - instead of base_ai_core-1.0-SNAPSHOT-shaded.jar there should be core-0.0.1-SNAPSHOT-shaded.jar. Sorry - it's my mistake, because I've forgot to update the docker files after renaming the packages during a heavy refactoring round. The fix has already bin committed to the "first_set_of_improvements" branch. You could check-out this branch and re-try again. Unfortunately the error description provided by Docker is quite cryptic and doesn't allow to understand, what exactly happened.
Let me know if you have any issues starting the docker container. The model requires many environment variables as the part of configuration so if you have any issues - just let me know.
Also please let me know if this issue is still a valid one.
Thanks again for your reply!
I just did a checkout of first_set_of_improvements and looks much better now, resp. the docker image got built :-)
transformer_mlm 0.0.1-SNAPSHOT 1624bea319b8 12 seconds ago 2.01GB
[INFO] Reactor Summary for Transformers in Java 0.0.1-SNAPSHOT: [INFO] [INFO] Transformers in Java ............................... SUCCESS [ 0.112 s] [INFO] Core functionality ................................. SUCCESS [03:49 min] [INFO] Samples of training the models ..................... SUCCESS [ 1.690 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 03:51 min [INFO] Finished at: 2023-09-13T10:48:13+02:00
Re the environment variables, is there an example configuration using docker-compose?
How can I use the running docker container?
Hi Michael.
I've just committed some changes into the same branch which contain some fixes and refactoring meant to make the structure of docker-related files clearer. Now all those files reside in the docker
folder. You need to pull the latest changes from _first_set_ofimprovements branch
Regarding docker compose - you won't need it, because there's only 1 docker image. The idea is to start the docker container passing the environment variables tinto the docker run
command. Inside the docker
folder there's a run_mlm_model_training.sh
file which I used for starting and running the model in the cloud on Debian OS. There you will find how the docker container is started and how the environment variables are passed to it.
There's also a start_training.service
file there folder which allows to automatically start the model running after the system start - this one is useful if you use a cloud VM similar to a "spot" VM in GCP (you're sure that after any unexpected restart of VM the model will continue running).
Regarding environment variables in general and any example of how to run the model. There's a sample file org.tarik.train.TrainMlm
which basically starts the model training and could define its intermediate test logic so that you could make the inference during training and see if the actual accuracy based on your test data is ok. There's also an IntelliJ run config added to this project in .run folder which allows you to start this class from IntelliJ directly without the need to use docker image. All environment variables in that class have default values so if you need to specify only those, which you want to override. The comments in this class describe all the variables which you'd need.
Two important things you need to configure additionally, in order to run TrainMlm.java
:
IDataProvider
instance and set it using transformer.setDataProvider()
method. Currently a WikiArticlesContentProvider
is being used, but it's working only if you have a corresponding MongoDB running with Wikipidea dump, which is not a way to go for you. You have to implement your own IDataProvider
inline class which will fetch the data you need. The method org.tarik.train.TrainMlm#getWikiArticlesContentProvider
might give an hint of how you could fetch the data in chunks and decide if there's something more to fetch. If you let me know which data source you use for training, I could create a simple example for you so that you could use it out-of-the box.root_path
environment variable, e.g. root_path=D:/temp/model
. The IntelliJ runner for TrainMlm.java
doesn't have it. I've just created the docker image using create_image_mlm_linux_avx2
IntelliJ run config and started that image container locally using the command docker run -d transformer_mlm:0.0.1-SNAPSHOT java -XX:+UseZGC -XX:SoftMaxHeapSize=2G -XX:ZCollectionInterval=15 -XX:ZUncommitDelay=10 -Xmx3G --enable-preview -jar train-0.0.1-SNAPSHOT-shaded.jar
If you want to run it in Linux environment - simply adapt the docker/run_mlm_model_training.sh
file to your needs and it should be enough, but I'd recommend to first run it in IntelliJ (or any other IDE), implement missing parts in TrainMlm.java
and after you can start the training and see primary results, create the docker image with your changes.
If you need additional info or help - just let me know.
Thank you very much! Will pull, build and run tomorrow and let you know right afterwards :-)
Hi Taras, I managed to get it running within IntelliJ :-) whereas I have set root_path inside .run/TrainMlm.run.xml to
<env name="root_path" value="/Users/michaelwechner/src/transformers-in-java/train/src/main/resources/test_data" />
and replaced the method WikiDatastore.fetchArticlesFromDb inside TrainMlm by a mock method returning mock Wiki articles.
It is starting up now without errors, but I am not sure whether it is running correctly, whereas please see the output below:
/Library/Java/JavaVirtualMachines/jdk-20.jdk/Contents/Home/bin/java Clang: "12.0.0 (clang-1200.0.32.29)" STD version: 201103L DEFAULT_ENGINE: samediff::ENGINE_CPU HAVE_FLATBUFFERS HAVE_OPENBLAS 15:55:35.640 [pool-1-thread-1] INFO org.tarik.train.CommonTrainer - Memory taken : 0.3 GB, free : 0.1 GB 15:55:35.861 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Building a transformer's encoder SameDiff model with the following params:
It seems to hang after outputting the above content. Is this the correct behaviour or any idea what might be wrong?
Thanks
Michael
Hi Michael, glad that you've made it running in IntelliJ - it's the first step and having the model running with that code will let you create a docker image based on it and run it everywhere you need.
Regarding the output - it's a normal output. The warnings you see are normal for DL4J, as well as the list of all loaded operations. But if you want to see the progress of the model, you need to set the parameter of logging interval - LOG_FREQ
in TrainMlm.java
is the one. Default is value is 50 iterations (steps), but if you want to see immediately the progress - simply set this value to 1. There's also MAX_MEMORY_LOG_FREQ_MINUTES
constant - this one prints out the memory usage, it's in your logs : "Memory taken : 0.3 GB, free : 0.1 GB" and lets you see how much RAM the model is consuming.
Basically the TrainMlm.java
is written in a way that if any error/exception is thrown - you'll see it in the logs.
If you want to see more detailed logs - you could change the LOG_LEVEL
environment variable value to DEBUG
(it's INFO
by default). Also you can turn on the javacpp debug (if anything's wrong with C++ - related part) using the following code:
System.setProperty("org.bytedeco.javacpp.logger.debug", "true");
And regarding logging - there's also SameDiff
logging which allows to see details of each running operation. You can turn it on using this line : sd.enableDebugMode();
Also there's a very valuable listener class in the model class itself - CustomListener
. This one has different methods which allow you to see what happens during the execution. For example, the method public void opExecution(SameDiff sd, At at, MultiDataSet batch, SameDiffOp op, OpContext opContext, INDArray[] outputs) allows you to see the results and params of each operation after it's been executed. That's for kind of hardcore debugging purposes, but I use that listener almost always, even if I want to check that the intermediate state of my operations is ok.
If you see that the execution is hanging and no logs are coming out - it's probably an internal loop or something like that. I'd debug the code to the line where it supposedely runs into that problem, but I suspect that in case it's really the root cause of your issue, it could be hidden in fetching the data from the data provider - somewhere in org.tarik.core.network.models.transformer.mlm.MlmTransformerSdModel#getNewTokenBatch()
.
Hi Taras, thanks again for your feedback!
I have turned on DEBUG, etc. but have not found the issue yet.
Will keep you posted :-)
Btw, is it normal that getWikiArticlesContentProvider() gets called a lot (more than 100 times)?
Btw, is it normal that getWikiArticlesContentProvider() gets called a lot (more than 100 times)?
This method should be called only once - because it's the trainer class method and it's not used in any loops. What is really often called is the implementation of getPassages(Function<List<String>, Boolean> isLimitReachedFunction)
method. This one is called depending on how much data you provide to the model. The latter expects you to provide at least BATCH_SIZE
(default=128) sequences (passages) of tokens for one iteration. There is also MIN_SEQUENCE_UTILIZATION
(default=50%) variable which tells how many tokens in % from the sequence length (hardcoded as 256, my bad) each sequence from the provider should contain at least, so that it could be accepted by the model and added to the batchedTokenSentences
variable.
So if your provider gives back per one call less than BATCH_SIZE
sequences, the model will call it so many times, till it gets the whole batch. In a similar way, if the sequences which come from provider contain less than 50% of tokens (the rest will always be masked), the model skips (ignores) them and calls again getPassages(...)
as many times as needed in order to fill the batch. This could be the source of eternal loop, if the provider always gives back some data and gets never exhausted (e.g. is a simple mock). So the root cause of your problem could be any of those 2 factors or even both of them.
So if your provider gives back per one call less than
BATCH_SIZE
sequences, the model will call it so many times, till it gets the whole batch.
I only return 2 articles with very little content, so that is probably the problem. Will change it and try again :-) thanks!
I have dumped now some Wiki articles, resp. downloaded
https://mirror.accum.se/mirror/wikimedia.org/dumps/dewiki/20231001/
and the loop does not appear anymore now :-)
00:00:36.573 [main] INFO e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [0.3 sec]. 00:00:58.001 [main] DEBUG org.tarik.train.TrainMlm - Fetch Wiki articles ... 00:00:58.015 [main] INFO org.tarik.train.TrainMlm - Parse 135 Wiki articles ... 00:00:58.016 [main] INFO org.tarik.train.TrainMlm - 135 Wiki articles fetched. 00:01:18.846 [main] DEBUG o.t.c.n.m.t.AbstractOpenDatasetTransformerModel - Processing 128 sequences with 29331 tokens with average sequence capacity utilization 89.5%
BUT the process still seems to "hang" at
Added differentiated op encoder/0/self_attention_default/reshape_1 Added differentiated op encoder/0/self_attention_default/reshape Added differentiated op encoder/0/self_attention_default/matmul_1 Added differentiated op encoder/0/self_attention_default/matmul Added differentiated op encoder/0/layer_norm_1 Added differentiated op encoder/0/add Added differentiated op encoder/reshape_1 Added differentiated op gather
Is this after the network was built?
Hi Michael,
"Processing 128 sequences with 29331 tokens with average sequence capacity utilization 89.5%" tells actually that the model has got the training data and starts the training after converting this data into the MultiDataSet
. Seems ok for now.
The logs with the operations are indeed the ones showing up after the model was built and the training had been started.
What's the value of LOG_FREQ
in your setup ? I'd recommend to set it to 1 if you want to see immediately if it works. Also did you try to use CustomListener
in order to see if there is any progress and if the operations are executed?
Hi Taras, thanks again for your feedback!
Yes, I have set LOG_FREQ to 1 and also verified this by logging its value.
I looked at CustomerListener and how it is set for example at
core/src/main/java/org/tarik/core/network/models/transformer/mlm/MlmTransformerSdModel.java
but I do not really understand how I can actually use it when running train/src/main/java/org/tarik/train/TrainMlm.java
I will try to better understand asap, but any hints are very much appreciated :-)
Or do you mean that I should just add it to the various operations?
Thanks
Michael
Hi Michael,
Regarding CustomListener
- if you add to the method public void preOpExecution(SameDiff sd, At at, SameDiffOp op, OpContext opContext)
the following line as the first one, you'll see each operation which happens in the model :
LOG.info("Executing {} with input {} and output {}", op.getName(), op.getInputsToOp(), op.getOutputsOfOp());
It will allow you to understand if the model is training at all or if it's stuck somewhere. Because you can't debug SameDiff directly (you actually can, but it's too complex), this listener is a good utility to see what happens during training. It also has methods which allow to see which weights are updated, as well as to see what happens before each operation is executed.
It also came to my mind, that it's important to use a correct ND4J backend classifier so that there are no platform-related issues. The default platform in the project is Windows. It's configured in the profiles
section of core/pom.xml
. But you wrote that you use MacOS. So you should use either macosx-x86_64-avx2
or macosx-x86_64-avx512
depending on the architecture of your CPU
It also came to my mind, that it's important to use a correct ND4J backend classifier so that there are no platform-related issues. The default platform in the project is Windows. It's configured in the
profiles
section ofcore/pom.xml
. But you wrote that you use MacOS. So you should use eithermacosx-x86_64-avx2
ormacosx-x86_64-avx512
depending on the architecture of your CPU
Cool, thanks, whereas my chip is "Apple M1 Pro", so I think I need macosx-arm64 as classifier ... will give it a try asap :-)
LOG.info("Executing {} with input {} and output {}", op.getName(), op.getInputsToOp(), op.getOutputsOfOp());
Understood now :-) thanks!
After adding and rebuilding I get the following output
Added differentiated op gather 21:30:15.536 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/reshape with input [batchTokenEmbeddings-grad/sd_var] and output [batchTokenEmbeddings-grad/reshape] 21:30:15.538 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/self_attention_default/multiply_1 with input [encoder/self_attention_default/attentionHeadsAmount, encoder/self_attention_default/attentionHeadEmbeddingSize] and output [encoder/self_attention_default/multiply_1] 21:30:15.539 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing neg_1 with input [one-var] and output [reduce_mean-grad] 21:30:15.540 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing predictionTokenEmbeddings-grad/reshape with input [predictionTokenEmbeddings-grad/sd_var] and output [predictionTokenEmbeddings-grad/reshape] 21:30:15.552 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/zeroslike with input [tokenEmbeddingsMatrix] and output [batchTokenEmbeddings-grad/zeroslike] 21:30:15.556 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/rank with input [tokenEmbeddingsMatrix] and output [batchTokenEmbeddings-grad/rank] 21:30:15.681 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing onehot with input [flatPredictionTokenVocabIndices] and output [onehot] 21:30:15.818 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/shape_of_1 with input [inputMasks] and output [encoder/shape_of_1] 21:30:15.820 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/shape_of_2 with input [inputMasks] and output [encoder/shape_of_2] 21:30:15.821 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/ones_as with input [inputMasks] and output [encoder/broadcastOnes_encoder] 21:30:15.848 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing gather with input [tokenEmbeddingsMatrix, inputTokenVocabIndices] and output [batchTokenEmbeddings] 21:30:15.853 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/range_1 with input [batchTokenEmbeddings-grad/sd_var_2, batchTokenEmbeddings-grad/rank, batchTokenEmbeddings-grad/sd_var_3] and output [batchTokenEmbeddings-grad/range_1] 21:30:15.863 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/strided_slice_1 with input [encoder/shape_of_1] and output [encoder/strided_slice_1] 21:30:15.867 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/strided_slice_2 with input [encoder/shape_of_2] and output [encoder/strided_slice_2] 21:30:15.870 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/expand_dims with input [encoder/broadcastOnes_encoder] and output [encoder/expand_dims] 21:30:15.870 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/shape_of with input [batchTokenEmbeddings] and output [encoder/shape_of] 21:30:15.870 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_17 with input [batchTokenEmbeddings] and output [shape_of_17] 21:30:15.872 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/listdiff with input [batchTokenEmbeddings-grad/range_1, batchTokenEmbeddings-grad/reshape] and output [batchTokenEmbeddings-grad/listdiff, batchTokenEmbeddings-grad/listdiff:1] 21:30:15.873 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/cast_1 with input [encoder/strided_slice_1] and output [encoder/cast_1] 21:30:15.873 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/cast_2 with input [encoder/strided_slice_2] and output [encoder/cast_2] 21:30:15.873 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/strided_slice with input [encoder/shape_of] and output [encoder/strided_slice] 21:30:15.873 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/concat with input [batchTokenEmbeddings-grad/reshape, batchTokenEmbeddings-grad/listdiff] and output [batchTokenEmbeddings-grad/concat] 21:30:15.876 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/stack_1 with input [encoder/cast_1, encoder/sd_var, encoder/cast_2] and output [encoder/attentionIntermediateMaskShape] 21:30:15.877 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/cast with input [encoder/strided_slice] and output [encoder/encBatchSize] 21:30:15.888 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/permute_1 with input [batchTokenEmbeddings-grad/zeroslike, batchTokenEmbeddings-grad/concat] and output [batchTokenEmbeddings-grad/permute_1] 21:30:15.900 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/invert_permutation with input [batchTokenEmbeddings-grad/concat] and output [batchTokenEmbeddings-grad/invertpermutation] 21:30:15.902 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/reshape with input [inputMasks, encoder/attentionIntermediateMaskShape] and output [encoder/attentionIntermediateMask] 21:30:15.903 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/multiply with input [encoder/encBatchSize, encoder/encSequenceLength] and output [encoder/multiply] 21:30:15.903 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/self_attention_default/multiply with input [encoder/encBatchSize, encoder/encSequenceLength] and output [encoder/self_attention_default/multiply] 21:30:15.903 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/self_attention_default/stack_1 with input [encoder/encBatchSize, encoder/encSequenceLength, encoder/self_attention_default/attentionHeadsAmount, encoder/self_attention_default/attentionHeadEmbeddingSize] and output [encoder/self_attention_default/keysPerHeadAttentionShape] 21:30:15.903 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/self_attention_default/stack_2 with input [encoder/encBatchSize, encoder/encSequenceLength, encoder/self_attention_default/attentionHeadsAmount, encoder/self_attention_default/attentionHeadEmbeddingSize] and output [encoder/self_attention_default/queriesPerHeadAttentionShape] 21:30:15.904 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/stack_2 with input [encoder/encBatchSize, encoder/one] and output [encoder/stack_2] 21:30:15.909 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/multiply_1 with input [encoder/expanddims, encoder/attentionIntermediateMask] and output [encoder/intermediateSelfAttentionMasks_encoder] 21:30:15.913 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/stack with input [encoder/multiply, encoder/encHiddenSize] and output [encoder/encHiddenLayerInputShape] 21:30:15.915 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/self_attention_default/stack with input [encoder/self_attention_default/multiply, encoder/self_attention_default/multiply_1] and output [encoder/self_attention_default/attentionDotProductShape] 21:30:15.946 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/tile with input [positionalEmbeddingsMatrix, encoder/stack_2] and output [encoder/positionalEmbeddingsForAttention] 21:30:16.300 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/expand_dims_1 with input [encoder/intermediateSelfAttentionMasks_encoder] and output [encoder/selfAttentionMasks] 21:30:16.321 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/reshape_1 with input [batchTokenEmbeddings, encoder/encHiddenLayerInputShape] and output [encoder/reshape_1] 21:30:16.339 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/sub_scalar with input [encoder/selfAttentionMasks] and output [encoder/0/self_attention_default/sub_scalar] 21:30:16.345 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/sub_scalar with input [encoder/selfAttentionMasks] and output [encoder/1/self_attention_default/sub_scalar] 21:30:16.352 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/2/self_attention_default/sub_scalar with input [encoder/selfAttentionMasks] and output [encoder/2/self_attention_default/sub_scalar] 21:30:16.358 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/3/self_attention_default/sub_scalar with input [encoder/selfAttentionMasks] and output [encoder/3/self_attention_default/sub_scalar] 21:30:16.386 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/layer_norm with input [encoder/reshape_1, encoder/0/normalizedAttentionInput_embedNormGain] and output [encoder/0/normalizedAttentionInput] 21:30:16.479 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/add with input [encoder/reshape_1, encoder/positionalEmbeddingsForAttention] and output [encoder/0/keyAndQueryInput_WithPositionalEmbed] 21:30:16.539 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/mul_scalar with input [encoder/0/self_attention_default/sub_scalar] and output [encoder/0/self_attention_default/attentionMaskDisqualifier_0] 21:30:16.544 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/mul_scalar with input [encoder/1/self_attention_default/sub_scalar] and output [encoder/1/self_attention_default/attentionMaskDisqualifier_1] 21:30:16.561 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/2/self_attention_default/mul_scalar with input [encoder/2/self_attention_default/sub_scalar] and output [encoder/2/self_attention_default/attentionMaskDisqualifier_2] 21:30:16.580 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/3/self_attention_default/mul_scalar with input [encoder/3/self_attention_default/sub_scalar] and output [encoder/3/self_attention_default/attentionMaskDisqualifier_3] 21:30:16.595 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/matmul_2 with input [encoder/0/normalizedAttentionInput, encoder/0/self_attention_default/AttentionValueWeights_0] and output [encoder/0/self_attention_default/valueProjections_0] 21:30:17.185 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/layer_norm_1 with input [encoder/0/keyAndQueryInput_WithPositionalEmbed, encoder/0/keyAndQueryInputNormalized_embedNormGain] and output [encoder/0/keyAndQueryInputNormalized] 21:30:17.324 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/reshape_2 with input [encoder/0/self_attention_default/valueProjections_0, encoder/self_attention_default/keysPerHeadAttentionShape] and output [encoder/0/self_attention_default/reshape_2] 21:30:17.334 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_14 with input [encoder/0/self_attention_default/valueProjections_0] and output [shape_of_14] 21:30:17.372 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/matmul with input [encoder/0/keyAndQueryInputNormalized, encoder/0/self_attention_default/AttentionKeyWeights_0] and output [encoder/0/self_attention_default/keyProjections_0] 21:30:18.061 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/matmul_1 with input [encoder/0/keyAndQueryInputNormalized, encoder/0/self_attention_default/AttentionQueryWeights_0] and output [encoder/0/self_attention_default/queryProjections_0] 21:30:18.716 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/permute_2 with input [encoder/0/self_attention_default/reshape_2] and output [encoder/0/self_attention_default/permute_2] 21:30:18.749 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/reshape with input [encoder/0/self_attention_default/keyProjections_0, encoder/self_attention_default/keysPerHeadAttentionShape] and output [encoder/0/self_attention_default/reshape] 21:30:18.764 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_16 with input [encoder/0/self_attention_default/keyProjections_0] and output [shape_of_16] 21:30:18.827 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/reshape_1 with input [encoder/0/self_attention_default/queryProjections_0, encoder/self_attention_default/queriesPerHeadAttentionShape] and output [encoder/0/self_attention_default/reshape_1] 21:30:18.837 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_15 with input [encoder/0/self_attention_default/queryProjections_0] and output [shape_of_15] 21:30:18.885 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/permute with input [encoder/0/self_attention_default/reshape] and output [encoder/0/self_attention_default/permute] 21:30:18.907 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/permute_1 with input [encoder/0/self_attention_default/reshape_1] and output [encoder/0/self_attention_default/permute_1] 21:30:18.989 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/matmul_3 with input [encoder/0/self_attention_default/permute_1, encoder/0/self_attention_default/permute] and output [encoder/0/self_attention_default/matmul_3]
where. it seems to be busy for some while, and then it continues
21:33:36.214 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/permute_3 with input [encoder/0/self_attention_default/valuesBasedOnAttentionScores_0] and output [encoder/0/self_attention_default/permute_3] 21:33:36.248 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/reshape_3 with input [encoder/0/self_attention_default/permute_3, encoder/self_attention_default/attentionDotProductShape] and output [encoder/0/self_attention_default/attentionDotProductOutput_0] 21:33:36.258 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_13 with input [encoder/0/self_attention_default/permute_3] and output [shape_of_13] 21:33:36.343 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/matmul_5 with input [encoder/0/self_attention_default/attentionDotProductOutput_0, encoder/0/self_attention_default/AttentionOutWeights_0] and output [encoder/0/self_attention_default/attentionOutput_0] 21:33:36.857 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/add_1 with input [encoder/reshape_1, encoder/0/self_attention_default/attentionOutput_0] and output [encoder/0/selfAttentionResidualProduct] 21:33:36.968 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/hidden_ff_layer_0/layer_norm with input [encoder/0/selfAttentionResidualProduct, encoder/hidden_ff_layer_0/hiddenLayerInputNormalized_0_embedNormGain] and output [encoder/hidden_ff_layer_0/hiddenLayerInputNormalized_0] 21:33:37.090 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/hidden_ff_layer_0/xw_plus_b with input [encoder/hidden_ff_layer_0/hiddenLayerInputNormalized_0, encoder/hidden_ff_layer_0/hiddenInnerLayerWeights_0, encoder/hidden_ff_layer_0/hiddenInnerLayerBias_0] and output [encoder/hidden_ff_layer_0/hiddenInnerLayerActivations_0] 21:33:37.793 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/hidden_ff_layer_0/gelu with input [encoder/hidden_ff_layer_0/hiddenInnerLayerActivations_0] and output [encoder/hidden_ff_layer_0/hiddenInnerLayerOutput_0] 21:33:37.850 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing _geluderivative_3 with input [encoder/hidden_ff_layer_0/hiddenInnerLayerActivations_0] and output [_geluderivative_3] 21:33:37.943 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/hidden_ff_layer_0/xw_plus_b_1 with input [encoder/hidden_ff_layer_0/hiddenInnerLayerOutput_0, encoder/hidden_ff_layer_0/hiddenOutLayerWeights_0, encoder/hidden_ff_layer_0/hiddenOutLayerBias_0] and output [encoder/hidden_ff_layer_0/hiddenLayerFinalOutputNormalized_0] 21:33:38.748 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/hidden_ff_layer_0/add with input [encoder/0/selfAttentionResidualProduct, encoder/hidden_ff_layer_0/hiddenLayerFinalOutputNormalized_0] and output [encoder/hidden_ff_layer_0/hiddenLayerResidualProductNormalized_0] 21:33:38.802 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/layer_norm with input [encoder/hidden_ff_layer_0/hiddenLayerResidualProductNormalized_0, encoder/1/normalizedAttentionInput_embedNormGain] and output [encoder/1/normalizedAttentionInput] 21:33:38.946 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/add with input [encoder/hidden_ff_layer_0/hiddenLayerResidualProductNormalized_0, encoder/positionalEmbeddingsForAttention] and output [encoder/1/keyAndQueryInput_WithPositionalEmbed] 21:33:39.051 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/matmul_2 with input [encoder/1/normalizedAttentionInput, encoder/1/self_attention_default/AttentionValueWeights_1] and output [encoder/1/self_attention_default/valueProjections_1] 21:33:39.718 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/layer_norm_1 with input [encoder/1/keyAndQueryInput_WithPositionalEmbed, encoder/1/keyAndQueryInputNormalized_embedNormGain] and output [encoder/1/keyAndQueryInputNormalized] 21:33:39.851 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/reshape_2 with input [encoder/1/self_attention_default/valueProjections_1, encoder/self_attention_default/keysPerHeadAttentionShape] and output [encoder/1/self_attention_default/reshape_2] 21:33:39.862 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_10 with input [encoder/1/self_attention_default/valueProjections_1] and output [shape_of_10] 21:33:39.898 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/matmul with input [encoder/1/keyAndQueryInputNormalized, encoder/1/self_attention_default/AttentionKeyWeights_1] and output [encoder/1/self_attention_default/keyProjections_1] 21:33:40.474 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/matmul_1 with input [encoder/1/keyAndQueryInputNormalized, encoder/1/self_attention_default/AttentionQueryWeights_1] and output [encoder/1/self_attention_default/queryProjections_1] 21:33:41.027 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/permute_2 with input [encoder/1/self_attention_default/reshape_2] and output [encoder/1/self_attention_default/permute_2] 21:33:41.065 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/reshape with input [encoder/1/self_attention_default/keyProjections_1, encoder/self_attention_default/keysPerHeadAttentionShape] and output [encoder/1/self_attention_default/reshape] 21:33:41.165 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_12 with input [encoder/1/self_attention_default/keyProjections_1] and output [shape_of_12] 21:33:41.184 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/reshape_1 with input [encoder/1/self_attention_default/queryProjections_1, encoder/self_attention_default/queriesPerHeadAttentionShape] and output [encoder/1/self_attention_default/reshape_1] 21:33:41.222 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_11 with input [encoder/1/self_attention_default/queryProjections_1] and output [shape_of_11] 21:33:41.255 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/permute with input [encoder/1/self_attention_default/reshape] and output [encoder/1/self_attention_default/permute] 21:33:41.288 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/permute_1 with input [encoder/1/self_attention_default/reshape_1] and output [encoder/1/self_attention_default/permute_1] 21:33:41.372 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/matmul_3 with input [encoder/1/self_attention_default/permute_1, encoder/1/self_attention_default/permute] and output [encoder/1/self_attention_default/matmul_3]
where it also seems to be busy for some time and then it continues
21:35:24.278 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/mul_scalar_1 with input [encoder/1/self_attention_default/matmul_3] and output [encoder/1/self_attention_default/attentionScoresBeforeMasking_1] 21:35:24.388 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/add with input [encoder/1/self_attention_default/attentionScoresBeforeMasking_1, encoder/1/self_attention_default/attentionMaskDisqualifier_1] and output [encoder/1/self_attention_default/attentionWeightsMasked_1] 21:35:24.467 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/softmax with input [encoder/1/self_attention_default/attentionWeightsMasked_1] and output [encoder/1/self_attention_default/attentionSoftmaxScores_1] 21:35:24.529 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/matmul_4 with input [encoder/1/self_attention_default/attentionSoftmaxScores_1, encoder/1/self_attention_default/permute_2] and output [encoder/1/self_attention_default/valuesBasedOnAttentionScores_1]
So it really seems to do the traning, it is just slow :-)
I guess to make it faster I just need to change the "dimensions of the model", etc. right?
I understand, that the model might not work well with smaller dimensions, but just to see whether the setup is correct and one is able to get a result after some reasonable time.
So it really seems to do the training, it is just slow :-)
It's actually shocking - 3 minutes for a matrix multiply is super slow - I think there's some issue with a platform implementation. Normally one iteration should take no more than a minute. In your case only one matrix multiply takes 3 times more and the iteration itself, which in its turn has a lot more operations than that one. I think you should try either macosx-x86_64-avx2
or macosx-x86_64-avx512
as alternative options and see if it helps. If it doesn't - there's a forum where you could get the answer from the developers of DL4J which relates to your specific case : https://community.konduit.ai/c/nd4j/13
I guess to make it faster I just need to change the "dimensions of the model", etc. right?
Nope - we need to find out what's wrong with you setup. I mean you could decrease the dimensions, but it doesn't resolve the real problem you have. And this problem is platoform-bound and it should be resolved anyway.
Hi Taras, thanks for your analysis! I will try the other options as you suggest and let you know :-)
@michaelwechner , are there any other comments from your side? Could this issue be closed ?
Closing the issue due to the significant inactivity period.
Hi
I am trying to build and run "transformers-in-java" on Mac OS X Monterey M1, whereas I have installed
java version "20.0.2" 2023-07-18 Java(TM) SE Runtime Environment (build 20.0.2+9-78) Java HotSpot(TM) 64-Bit Server VM (build 20.0.2+9-78, mixed mode, sharing)
and compiling seems to be fine, but when I run
mvn clean install
then I receive the following error when during building the docker image:
6 [1/8] FROM docker.io/library/openjdk:18-jdk-slim-buster@sha256:596bee0e3ed6c537b2c92cc53089772d880fb3f4413e438dcc147d61d52cc960
6 resolve docker.io/library/openjdk:18-jdk-slim-buster@sha256:596bee0e3ed6c537b2c92cc53089772d880fb3f4413e438dcc147d61d52cc960 0.0s done
6 sha256:596bee0e3ed6c537b2c92cc53089772d880fb3f4413e438dcc147d61d52cc960 547B / 547B done
6 sha256:d1d26ccd60e2cc188a1d0903f9fee99ddb576cf6de0800872fc4205d3bb148de 953B / 953B done
6 sha256:9eff783fb9735def23e99c4aeadf31d254a919891f897b55fc13b557f2dbc0b1 4.83kB / 4.83kB done
6 CANCELED
I have Docker installed, but I guess Docker on Mac OS X M1 is working differently than Docker on Linux.
Or any other idea what I might be doing wrong?
Or can I run the project without Docker, I mean directly from within the IDE?
Thanks!