partarstu / transformers-in-java

Experimental project for AI and NLP based on Transformer Architecture
https://partarstu.github.io/transformers-in-java/
Apache License 2.0
11 stars 2 forks source link

Building on MacOS X Monterey M1 #1

Closed michaelwechner closed 9 months ago

michaelwechner commented 1 year ago

Hi

I am trying to build and run "transformers-in-java" on Mac OS X Monterey M1, whereas I have installed

java version "20.0.2" 2023-07-18 Java(TM) SE Runtime Environment (build 20.0.2+9-78) Java HotSpot(TM) 64-Bit Server VM (build 20.0.2+9-78, mixed mode, sharing)

and compiling seems to be fine, but when I run

mvn clean install

then I receive the following error when during building the docker image:

6 [1/8] FROM docker.io/library/openjdk:18-jdk-slim-buster@sha256:596bee0e3ed6c537b2c92cc53089772d880fb3f4413e438dcc147d61d52cc960

6 resolve docker.io/library/openjdk:18-jdk-slim-buster@sha256:596bee0e3ed6c537b2c92cc53089772d880fb3f4413e438dcc147d61d52cc960 0.0s done

6 sha256:596bee0e3ed6c537b2c92cc53089772d880fb3f4413e438dcc147d61d52cc960 547B / 547B done

6 sha256:d1d26ccd60e2cc188a1d0903f9fee99ddb576cf6de0800872fc4205d3bb148de 953B / 953B done

6 sha256:9eff783fb9735def23e99c4aeadf31d254a919891f897b55fc13b557f2dbc0b1 4.83kB / 4.83kB done

6 CANCELED


[2/8] COPY target/base_ai_core-1.0-SNAPSHOT-shaded.jar /base_ai_core/:

dockerfile_mlm:4

2 |
3 | # copy the packaged jar file into our docker image 4 | >>> COPY target/base_ai_core-1.0-SNAPSHOT-shaded.jar /base_ai_core/ 5 |
6 | RUN apt-get --assume-yes update; apt-get --assume-yes install nano; apt-get --assume-yes install systemd;

ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref b71333e2-50ef-42d4-88af-8a6142b4eace::3h2ztzcta9tda5rpkzry3j8mc: failed to walk /var/lib/docker/tmp/buildkit-mount564556698/target: lstat /var/lib/docker/tmp/buildkit-mount564556698/target: no such file or directory [ERROR] Command execution failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) at org.apache.commons.exec.DefaultExecutor.executeInternal (DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.execute (DefaultExecutor.java:166) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:982) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:929) at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:457) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137) at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:370) at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:351) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:171) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:163) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:294) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:960) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293) at org.apache.maven.cli.MavenCli.main (MavenCli.java:196) at jdk.internal.reflect.DirectMethodHandleAccessor.invoke (DirectMethodHandleAccessor.java:104) at java.lang.reflect.Method.invoke (Method.java:578) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406) at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347) [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Transformers in Java 0.0.1-SNAPSHOT: [INFO] [INFO] Transformers in Java ............................... SUCCESS [ 0.110 s] [INFO] Core functionality ................................. FAILURE [02:00 min] [INFO] Samples of training the models ..................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:00 min [INFO] Finished at: 2023-09-11T23:45:40+02:00 [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:3.0.0:exec (docker-build) on project core: Command execution failed.: Process exited with an error: 1 (Exit value: 1) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :core

I have Docker installed, but I guess Docker on Mac OS X M1 is working differently than Docker on Linux.

Or any other idea what I might be doing wrong?

Or can I run the project without Docker, I mean directly from within the IDE?

Thanks!

michaelwechner commented 1 year ago

Or should I switch to Java 18 as used in the docker file?

partarstu commented 1 year ago

Hi Michael,

I'd suggest switching to Java 18. Although Java versions are backwards-compatible, I'm not sure the same refers to the Docker images.

Another option, if the one above doesn't help, is removing all the stuff in the docker file which is not directly related to building the image. If the problem still persists, I could try to build the image based on your parameters for JDK etc. on my Windows instance and see if it works. It could be Mac OS - specific. I built all of my images on my private Windows system and then uploaded them to the VM in the cloud so I'm not sure this Docker file works fine for all other OSs.

Best regards, Taras

On Tue, Sep 12, 2023 at 12:20 AM Michael Wechner @.***> wrote:

Or should I switch to Java 18 as used in the docker file?

— Reply to this email directly, view it on GitHub https://github.com/partarstu/transformers-in-java/issues/1#issuecomment-1714662716, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQACPRIU4FU2OIHZR3S2NKDXZ6FEJANCNFSM6AAAAAA4T6XYKA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Mit freundlichen Grüßen, Taras Paruta

michaelwechner commented 1 year ago

Hi Taras

Thanks for your quick feedback!

I have switched to Java 18, but receive the same error, whereas see below.

I will try to tomorrow to remove stuff from the docker file as you suggest and check whether this will work :-)

Thanks

Michael

[INFO] --- exec-maven-plugin:3.0.0:exec (docker-build) @ core ---

1 [internal] load .dockerignore

1 transferring context: 2B done

1 DONE 0.0s

2 [internal] load build definition from dockerfile_mlm

2 transferring dockerfile: 740B done

2 DONE 0.0s

3 [internal] load metadata for docker.io/library/openjdk:18-jdk-slim-buster

3 ...

4 [auth] library/openjdk:pull token for registry-1.docker.io

4 DONE 0.0s

3 [internal] load metadata for docker.io/library/openjdk:18-jdk-slim-buster

3 DONE 2.2s

5 [internal] load build context

5 transferring context: 2B done

5 DONE 0.0s

6 [2/8] COPY target/base_ai_core-1.0-SNAPSHOT-shaded.jar /base_ai_core/

6 ERROR: failed to calculate checksum of ref b71333e2-50ef-42d4-88af-8a6142b4eace::neukl8sec3ecnr8sybahygvkh: failed to walk /var/lib/docker/tmp/buildkit-mount2584341619/target: lstat /var/lib/docker/tmp/buildkit-mount2584341619/target: no such file or directory

7 [1/8] FROM docker.io/library/openjdk:18-jdk-slim-buster@sha256:596bee0e3ed6c537b2c92cc53089772d880fb3f4413e438dcc147d61d52cc960

7 resolve docker.io/library/openjdk:18-jdk-slim-buster@sha256:596bee0e3ed6c537b2c92cc53089772d880fb3f4413e438dcc147d61d52cc960 done

7 sha256:596bee0e3ed6c537b2c92cc53089772d880fb3f4413e438dcc147d61d52cc960 547B / 547B done

7 sha256:d1d26ccd60e2cc188a1d0903f9fee99ddb576cf6de0800872fc4205d3bb148de 953B / 953B done

7 sha256:9eff783fb9735def23e99c4aeadf31d254a919891f897b55fc13b557f2dbc0b1 4.83kB / 4.83kB done

7 CANCELED


[2/8] COPY target/base_ai_core-1.0-SNAPSHOT-shaded.jar /base_ai_core/:

dockerfile_mlm:4

2 |
3 | # copy the packaged jar file into our docker image 4 | >>> COPY target/base_ai_core-1.0-SNAPSHOT-shaded.jar /base_ai_core/ 5 |
6 | RUN apt-get --assume-yes update; apt-get --assume-yes install nano; apt-get --assume-yes install systemd;

ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref b71333e2-50ef-42d4-88af-8a6142b4eace::neukl8sec3ecnr8sybahygvkh: failed to walk /var/lib/docker/tmp/buildkit-mount2584341619/target: lstat /var/lib/docker/tmp/buildkit-mount2584341619/target: no such file or directory [ERROR] Command execution failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1) at org.apache.commons.exec.DefaultExecutor.executeInternal (DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.execute (DefaultExecutor.java:166) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:982) at org.codehaus.mojo.exec.ExecMojo.executeCommandLine (ExecMojo.java:929) at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:457) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137) at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:370) at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:351) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:171) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:163) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:294) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:960) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:293) at org.apache.maven.cli.MavenCli.main (MavenCli.java:196) at jdk.internal.reflect.DirectMethodHandleAccessor.invoke (DirectMethodHandleAccessor.java:104) at java.lang.reflect.Method.invoke (Method.java:578) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406) at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347) [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Transformers in Java 0.0.1-SNAPSHOT: [INFO] [INFO] Transformers in Java ............................... SUCCESS [ 0.122 s] [INFO] Core functionality ................................. FAILURE [02:01 min] [INFO] Samples of training the models ..................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:01 min [INFO] Finished at: 2023-09-12T00:25:11+02:00 [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:3.0.0:exec (docker-build) on project core: Command execution failed.: Process exited with an error: 1 (Exit value: 1) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :core

michaelwechner commented 1 year ago

btw, I just noticed, that the pom does require Java 20, whereas I have switched it to Java 18

git diff pom.xml diff --git a/pom.xml b/pom.xml index 3c70dd9..a91c41f 100644 --- a/pom.xml +++ b/pom.xml @@ -32,7 +32,7 @@

UTF-8 UTF-8 - 20 + 18 nd4j-native
michaelwechner commented 1 year ago

Also I have noticed that in the docker file the following lib should get copied

COPY target/base_ai_core-1.0-SNAPSHOT-shaded.jar /base_ai_core/

but in the target directory this lib does not exist

ls core/target/ classes core-0.0.1-SNAPSHOT.jar maven-archiver core-0.0.1-SNAPSHOT-shaded.jar generated-sources maven-status

partarstu commented 1 year ago

Hi Michael, the root cause of the issue, as you suggested, is the wrong name of the source JAR file - instead of base_ai_core-1.0-SNAPSHOT-shaded.jar there should be core-0.0.1-SNAPSHOT-shaded.jar. Sorry - it's my mistake, because I've forgot to update the docker files after renaming the packages during a heavy refactoring round. The fix has already bin committed to the "first_set_of_improvements" branch. You could check-out this branch and re-try again. Unfortunately the error description provided by Docker is quite cryptic and doesn't allow to understand, what exactly happened.

Let me know if you have any issues starting the docker container. The model requires many environment variables as the part of configuration so if you have any issues - just let me know.

Also please let me know if this issue is still a valid one.

michaelwechner commented 1 year ago

Thanks again for your reply!

I just did a checkout of first_set_of_improvements and looks much better now, resp. the docker image got built :-)

transformer_mlm 0.0.1-SNAPSHOT 1624bea319b8 12 seconds ago 2.01GB

[INFO] Reactor Summary for Transformers in Java 0.0.1-SNAPSHOT: [INFO] [INFO] Transformers in Java ............................... SUCCESS [ 0.112 s] [INFO] Core functionality ................................. SUCCESS [03:49 min] [INFO] Samples of training the models ..................... SUCCESS [ 1.690 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 03:51 min [INFO] Finished at: 2023-09-13T10:48:13+02:00

Re the environment variables, is there an example configuration using docker-compose?

How can I use the running docker container?

partarstu commented 1 year ago

Hi Michael.

I've just committed some changes into the same branch which contain some fixes and refactoring meant to make the structure of docker-related files clearer. Now all those files reside in the docker folder. You need to pull the latest changes from _first_set_ofimprovements branch

Regarding docker compose - you won't need it, because there's only 1 docker image. The idea is to start the docker container passing the environment variables tinto the docker run command. Inside the docker folder there's a run_mlm_model_training.sh file which I used for starting and running the model in the cloud on Debian OS. There you will find how the docker container is started and how the environment variables are passed to it.

There's also a start_training.service file there folder which allows to automatically start the model running after the system start - this one is useful if you use a cloud VM similar to a "spot" VM in GCP (you're sure that after any unexpected restart of VM the model will continue running).

Regarding environment variables in general and any example of how to run the model. There's a sample file org.tarik.train.TrainMlm which basically starts the model training and could define its intermediate test logic so that you could make the inference during training and see if the actual accuracy based on your test data is ok. There's also an IntelliJ run config added to this project in .run folder which allows you to start this class from IntelliJ directly without the need to use docker image. All environment variables in that class have default values so if you need to specify only those, which you want to override. The comments in this class describe all the variables which you'd need.

Two important things you need to configure additionally, in order to run TrainMlm.java:

  1. You need to define your own IDataProvider instance and set it using transformer.setDataProvider() method. Currently a WikiArticlesContentProvider is being used, but it's working only if you have a corresponding MongoDB running with Wikipidea dump, which is not a way to go for you. You have to implement your own IDataProvider inline class which will fetch the data you need. The method org.tarik.train.TrainMlm#getWikiArticlesContentProvider might give an hint of how you could fetch the data in chunks and decide if there's something more to fetch. If you let me know which data source you use for training, I could create a simple example for you so that you could use it out-of-the box.
  2. You need to define a root_path environment variable, e.g. root_path=D:/temp/model. The IntelliJ runner for TrainMlm.java doesn't have it.

I've just created the docker image using create_image_mlm_linux_avx2 IntelliJ run config and started that image container locally using the command docker run -d transformer_mlm:0.0.1-SNAPSHOT java -XX:+UseZGC -XX:SoftMaxHeapSize=2G -XX:ZCollectionInterval=15 -XX:ZUncommitDelay=10 -Xmx3G --enable-preview -jar train-0.0.1-SNAPSHOT-shaded.jar If you want to run it in Linux environment - simply adapt the docker/run_mlm_model_training.sh file to your needs and it should be enough, but I'd recommend to first run it in IntelliJ (or any other IDE), implement missing parts in TrainMlm.java and after you can start the training and see primary results, create the docker image with your changes.

If you need additional info or help - just let me know.

michaelwechner commented 1 year ago

Thank you very much! Will pull, build and run tomorrow and let you know right afterwards :-)

michaelwechner commented 1 year ago

Hi Taras, I managed to get it running within IntelliJ :-) whereas I have set root_path inside .run/TrainMlm.run.xml to

<env name="root_path" value="/Users/michaelwechner/src/transformers-in-java/train/src/main/resources/test_data" />

and replaced the method WikiDatastore.fetchArticlesFromDb inside TrainMlm by a mock method returning mock Wiki articles.

It is starting up now without errors, but I am not sure whether it is running correctly, whereas please see the output below:


/Library/Java/JavaVirtualMachines/jdk-20.jdk/Contents/Home/bin/java Clang: "12.0.0 (clang-1200.0.32.29)" STD version: 201103L DEFAULT_ENGINE: samediff::ENGINE_CPU HAVE_FLATBUFFERS HAVE_OPENBLAS 15:55:35.640 [pool-1-thread-1] INFO org.tarik.train.CommonTrainer - Memory taken : 0.3 GB, free : 0.1 GB 15:55:35.861 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Building a transformer's encoder SameDiff model with the following params:

15:55:36.173 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Created a transformer's encoder-based MLM model with the following params:

hiddenSize=768 learningRate=0.00010 attentionHeadsAmount=8 encoderLayersAmount=4 intermediateLayerSize=1024 sequenceLength=256 batchSize=128 labelSmoothing=0.1 percentageOfTokensToBePredicted=15 maxSizeOfWholePhrasePrediction=3 percentageOfMaskingPerPrediction=80 beta2=0.98 loggingFrequency=50 testingFrequency=100

15:55:36.179 [main] INFO org.tarik.train.TrainMlm - Total model size is 38.45 million params 15:55:36.294 [main] INFO e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize 15:55:36.368 [main] INFO e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize 15:55:36.371 [main] INFO e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize 15:55:36.371 [main] INFO e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator pos 15:55:36.745 [main] INFO e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [0.4 sec]. 15:55:42.075 [main] WARN o.n.i.converters.ImportClassMapping - Duplicate TF op mapping found for op Pow: org.nd4j.linalg.api.ops.impl.scalar.Pow vs org.nd4j.linalg.api.ops.impl.transforms.custom.Pow 15:55:42.079 [main] WARN o.n.i.converters.ImportClassMapping - Duplicate TF op mapping found for op FloorMod: org.nd4j.linalg.api.ops.impl.transforms.pairwise.arithmetic.FModOp vs org.nd4j.linalg.api.ops.impl.transforms.pairwise.arithmetic.FloorModOp 15:55:42.373 [main] WARN o.n.a.functions.DifferentialFunction - No fields found for property name dtype for class org.nd4j.linalg.api.ops.impl.shape.OnesLike In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) In arrLength usage aDArrayFactory::create(const char order, const std::vector& shape, sd::DataType dtype, sd::LaunchContext context) Added differentiated op neg Added differentiated op reduce_mean Added differentiated op reduce_sum Added differentiated op multiply Added differentiated op log Added differentiated op softmax Added differentiated op add Added differentiated op matmul Added differentiated op gather_1 Added differentiated op encoder/layer_norm Added differentiated op encoder/hidden_ff_layer_3/add Added differentiated op encoder/hidden_ff_layer_3/xw_plus_b_1 Added differentiated op encoder/hidden_ff_layer_3/gelu Added differentiated op encoder/hidden_ff_layer_3/xw_plus_b Added differentiated op encoder/hidden_ff_layer_3/layer_norm Added differentiated op encoder/3/add_1 Added differentiated op encoder/3/self_attention_default/matmul_5 Added differentiated op encoder/3/self_attention_default/reshape_3 Added differentiated op encoder/3/self_attention_default/permute_3 Added differentiated op encoder/3/self_attention_default/matmul_4 Added differentiated op encoder/3/self_attention_default/softmax Added differentiated op encoder/3/self_attention_default/permute_2 Added differentiated op encoder/3/self_attention_default/add Added differentiated op encoder/3/self_attention_default/reshape_2 Added differentiated op encoder/3/self_attention_default/mul_scalar_1 Added differentiated op encoder/3/self_attention_default/matmul_2 Added differentiated op encoder/3/self_attention_default/matmul_3 Added differentiated op encoder/3/layer_norm Added differentiated op encoder/3/self_attention_default/permute_1 Added differentiated op encoder/3/self_attention_default/permute Added differentiated op encoder/3/self_attention_default/reshape_1 Added differentiated op encoder/3/self_attention_default/reshape Added differentiated op encoder/3/self_attention_default/matmul_1 Added differentiated op encoder/3/self_attention_default/matmul Added differentiated op encoder/3/layer_norm_1 Added differentiated op encoder/3/add Added differentiated op encoder/hidden_ff_layer_2/add Added differentiated op encoder/hidden_ff_layer_2/xw_plus_b_1 Added differentiated op encoder/hidden_ff_layer_2/gelu Added differentiated op encoder/hidden_ff_layer_2/xw_plus_b Added differentiated op encoder/hidden_ff_layer_2/layer_norm Added differentiated op encoder/2/add_1 Added differentiated op encoder/2/self_attention_default/matmul_5 Added differentiated op encoder/2/self_attention_default/reshape_3 Added differentiated op encoder/2/self_attention_default/permute_3 Added differentiated op encoder/2/self_attention_default/matmul_4 Added differentiated op encoder/2/self_attention_default/softmax Added differentiated op encoder/2/self_attention_default/permute_2 Added differentiated op encoder/2/self_attention_default/add Added differentiated op encoder/2/self_attention_default/reshape_2 Added differentiated op encoder/2/self_attention_default/mul_scalar_1 Added differentiated op encoder/2/self_attention_default/matmul_2 Added differentiated op encoder/2/self_attention_default/matmul_3 Added differentiated op encoder/2/layer_norm Added differentiated op encoder/2/self_attention_default/permute_1 Added differentiated op encoder/2/self_attention_default/permute Added differentiated op encoder/2/self_attention_default/reshape_1 Added differentiated op encoder/2/self_attention_default/reshape Added differentiated op encoder/2/self_attention_default/matmul_1 Added differentiated op encoder/2/self_attention_default/matmul Added differentiated op encoder/2/layer_norm_1 Added differentiated op encoder/2/add Added differentiated op encoder/hidden_ff_layer_1/add Added differentiated op encoder/hidden_ff_layer_1/xw_plus_b_1 Added differentiated op encoder/hidden_ff_layer_1/gelu Added differentiated op encoder/hidden_ff_layer_1/xw_plus_b Added differentiated op encoder/hidden_ff_layer_1/layer_norm Added differentiated op encoder/1/add_1 Added differentiated op encoder/1/self_attention_default/matmul_5 Added differentiated op encoder/1/self_attention_default/reshape_3 Added differentiated op encoder/1/self_attention_default/permute_3 Added differentiated op encoder/1/self_attention_default/matmul_4 Added differentiated op encoder/1/self_attention_default/softmax Added differentiated op encoder/1/self_attention_default/permute_2 Added differentiated op encoder/1/self_attention_default/add Added differentiated op encoder/1/self_attention_default/reshape_2 Added differentiated op encoder/1/self_attention_default/mul_scalar_1 Added differentiated op encoder/1/self_attention_default/matmul_2 Added differentiated op encoder/1/self_attention_default/matmul_3 Added differentiated op encoder/1/layer_norm Added differentiated op encoder/1/self_attention_default/permute_1 Added differentiated op encoder/1/self_attention_default/permute Added differentiated op encoder/1/self_attention_default/reshape_1 Added differentiated op encoder/1/self_attention_default/reshape Added differentiated op encoder/1/self_attention_default/matmul_1 Added differentiated op encoder/1/self_attention_default/matmul Added differentiated op encoder/1/layer_norm_1 Added differentiated op encoder/1/add Added differentiated op encoder/hidden_ff_layer_0/add Added differentiated op encoder/hidden_ff_layer_0/xw_plus_b_1 Added differentiated op encoder/hidden_ff_layer_0/gelu Added differentiated op encoder/hidden_ff_layer_0/xw_plus_b Added differentiated op encoder/hidden_ff_layer_0/layer_norm Added differentiated op encoder/0/add_1 Added differentiated op encoder/0/self_attention_default/matmul_5 Added differentiated op encoder/0/self_attention_default/reshape_3 Added differentiated op encoder/0/self_attention_default/permute_3 Added differentiated op encoder/0/self_attention_default/matmul_4 Added differentiated op encoder/0/self_attention_default/softmax Added differentiated op encoder/0/self_attention_default/permute_2 Added differentiated op encoder/0/self_attention_default/add Added differentiated op encoder/0/self_attention_default/reshape_2 Added differentiated op encoder/0/self_attention_default/mul_scalar_1 Added differentiated op encoder/0/self_attention_default/matmul_2 Added differentiated op encoder/0/self_attention_default/matmul_3 Added differentiated op encoder/0/layer_norm Added differentiated op encoder/0/self_attention_default/permute_1 Added differentiated op encoder/0/self_attention_default/permute Added differentiated op encoder/0/self_attention_default/reshape_1 Added differentiated op encoder/0/self_attention_default/reshape Added differentiated op encoder/0/self_attention_default/matmul_1 Added differentiated op encoder/0/self_attention_default/matmul Added differentiated op encoder/0/layer_norm_1 Added differentiated op encoder/0/add Added differentiated op encoder/reshape_1 Added differentiated op gather

It seems to hang after outputting the above content. Is this the correct behaviour or any idea what might be wrong?

Thanks

Michael

partarstu commented 1 year ago

Hi Michael, glad that you've made it running in IntelliJ - it's the first step and having the model running with that code will let you create a docker image based on it and run it everywhere you need.

Regarding the output - it's a normal output. The warnings you see are normal for DL4J, as well as the list of all loaded operations. But if you want to see the progress of the model, you need to set the parameter of logging interval - LOG_FREQin TrainMlm.java is the one. Default is value is 50 iterations (steps), but if you want to see immediately the progress - simply set this value to 1. There's also MAX_MEMORY_LOG_FREQ_MINUTESconstant - this one prints out the memory usage, it's in your logs : "Memory taken : 0.3 GB, free : 0.1 GB" and lets you see how much RAM the model is consuming.

Basically the TrainMlm.java is written in a way that if any error/exception is thrown - you'll see it in the logs.

If you want to see more detailed logs - you could change the LOG_LEVELenvironment variable value to DEBUG(it's INFOby default). Also you can turn on the javacpp debug (if anything's wrong with C++ - related part) using the following code: System.setProperty("org.bytedeco.javacpp.logger.debug", "true"); And regarding logging - there's also SameDiff logging which allows to see details of each running operation. You can turn it on using this line : sd.enableDebugMode();

Also there's a very valuable listener class in the model class itself - CustomListener. This one has different methods which allow you to see what happens during the execution. For example, the method public void opExecution(SameDiff sd, At at, MultiDataSet batch, SameDiffOp op, OpContext opContext, INDArray[] outputs) allows you to see the results and params of each operation after it's been executed. That's for kind of hardcore debugging purposes, but I use that listener almost always, even if I want to check that the intermediate state of my operations is ok.

partarstu commented 1 year ago

If you see that the execution is hanging and no logs are coming out - it's probably an internal loop or something like that. I'd debug the code to the line where it supposedely runs into that problem, but I suspect that in case it's really the root cause of your issue, it could be hidden in fetching the data from the data provider - somewhere in org.tarik.core.network.models.transformer.mlm.MlmTransformerSdModel#getNewTokenBatch().

michaelwechner commented 1 year ago

Hi Taras, thanks again for your feedback!

I have turned on DEBUG, etc. but have not found the issue yet.

Will keep you posted :-)

michaelwechner commented 1 year ago

Btw, is it normal that getWikiArticlesContentProvider() gets called a lot (more than 100 times)?

partarstu commented 1 year ago

Btw, is it normal that getWikiArticlesContentProvider() gets called a lot (more than 100 times)?

This method should be called only once - because it's the trainer class method and it's not used in any loops. What is really often called is the implementation of getPassages(Function<List<String>, Boolean> isLimitReachedFunction) method. This one is called depending on how much data you provide to the model. The latter expects you to provide at least BATCH_SIZE(default=128) sequences (passages) of tokens for one iteration. There is also MIN_SEQUENCE_UTILIZATION(default=50%) variable which tells how many tokens in % from the sequence length (hardcoded as 256, my bad) each sequence from the provider should contain at least, so that it could be accepted by the model and added to the batchedTokenSentencesvariable.

So if your provider gives back per one call less than BATCH_SIZEsequences, the model will call it so many times, till it gets the whole batch. In a similar way, if the sequences which come from provider contain less than 50% of tokens (the rest will always be masked), the model skips (ignores) them and calls again getPassages(...) as many times as needed in order to fill the batch. This could be the source of eternal loop, if the provider always gives back some data and gets never exhausted (e.g. is a simple mock). So the root cause of your problem could be any of those 2 factors or even both of them.

michaelwechner commented 1 year ago

So if your provider gives back per one call less than BATCH_SIZEsequences, the model will call it so many times, till it gets the whole batch.

I only return 2 articles with very little content, so that is probably the problem. Will change it and try again :-) thanks!

michaelwechner commented 1 year ago

I have dumped now some Wiki articles, resp. downloaded

https://mirror.accum.se/mirror/wikimedia.org/dumps/dewiki/20231001/

and the loop does not appear anymore now :-)

00:00:36.573 [main] INFO e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [0.3 sec]. 00:00:58.001 [main] DEBUG org.tarik.train.TrainMlm - Fetch Wiki articles ... 00:00:58.015 [main] INFO org.tarik.train.TrainMlm - Parse 135 Wiki articles ... 00:00:58.016 [main] INFO org.tarik.train.TrainMlm - 135 Wiki articles fetched. 00:01:18.846 [main] DEBUG o.t.c.n.m.t.AbstractOpenDatasetTransformerModel - Processing 128 sequences with 29331 tokens with average sequence capacity utilization 89.5%

BUT the process still seems to "hang" at

Added differentiated op encoder/0/self_attention_default/reshape_1 Added differentiated op encoder/0/self_attention_default/reshape Added differentiated op encoder/0/self_attention_default/matmul_1 Added differentiated op encoder/0/self_attention_default/matmul Added differentiated op encoder/0/layer_norm_1 Added differentiated op encoder/0/add Added differentiated op encoder/reshape_1 Added differentiated op gather

Is this after the network was built?

partarstu commented 1 year ago

Hi Michael,

"Processing 128 sequences with 29331 tokens with average sequence capacity utilization 89.5%" tells actually that the model has got the training data and starts the training after converting this data into the MultiDataSet. Seems ok for now.

The logs with the operations are indeed the ones showing up after the model was built and the training had been started.

What's the value of LOG_FREQ in your setup ? I'd recommend to set it to 1 if you want to see immediately if it works. Also did you try to use CustomListenerin order to see if there is any progress and if the operations are executed?

michaelwechner commented 1 year ago

Hi Taras, thanks again for your feedback!

Yes, I have set LOG_FREQ to 1 and also verified this by logging its value.

I looked at CustomerListener and how it is set for example at

core/src/main/java/org/tarik/core/network/models/transformer/mlm/MlmTransformerSdModel.java

but I do not really understand how I can actually use it when running train/src/main/java/org/tarik/train/TrainMlm.java

I will try to better understand asap, but any hints are very much appreciated :-)

Or do you mean that I should just add it to the various operations?

Thanks

Michael

partarstu commented 1 year ago

Hi Michael,

Regarding CustomListener- if you add to the method public void preOpExecution(SameDiff sd, At at, SameDiffOp op, OpContext opContext) the following line as the first one, you'll see each operation which happens in the model :
LOG.info("Executing {} with input {} and output {}", op.getName(), op.getInputsToOp(), op.getOutputsOfOp()); It will allow you to understand if the model is training at all or if it's stuck somewhere. Because you can't debug SameDiff directly (you actually can, but it's too complex), this listener is a good utility to see what happens during training. It also has methods which allow to see which weights are updated, as well as to see what happens before each operation is executed.

partarstu commented 1 year ago

It also came to my mind, that it's important to use a correct ND4J backend classifier so that there are no platform-related issues. The default platform in the project is Windows. It's configured in the profilessection of core/pom.xml. But you wrote that you use MacOS. So you should use either macosx-x86_64-avx2 or macosx-x86_64-avx512 depending on the architecture of your CPU

michaelwechner commented 1 year ago

It also came to my mind, that it's important to use a correct ND4J backend classifier so that there are no platform-related issues. The default platform in the project is Windows. It's configured in the profilessection of core/pom.xml. But you wrote that you use MacOS. So you should use either macosx-x86_64-avx2 or macosx-x86_64-avx512 depending on the architecture of your CPU

Cool, thanks, whereas my chip is "Apple M1 Pro", so I think I need macosx-arm64 as classifier ... will give it a try asap :-)

michaelwechner commented 1 year ago

LOG.info("Executing {} with input {} and output {}", op.getName(), op.getInputsToOp(), op.getOutputsOfOp());

Understood now :-) thanks!

After adding and rebuilding I get the following output

Added differentiated op gather 21:30:15.536 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/reshape with input [batchTokenEmbeddings-grad/sd_var] and output [batchTokenEmbeddings-grad/reshape] 21:30:15.538 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/self_attention_default/multiply_1 with input [encoder/self_attention_default/attentionHeadsAmount, encoder/self_attention_default/attentionHeadEmbeddingSize] and output [encoder/self_attention_default/multiply_1] 21:30:15.539 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing neg_1 with input [one-var] and output [reduce_mean-grad] 21:30:15.540 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing predictionTokenEmbeddings-grad/reshape with input [predictionTokenEmbeddings-grad/sd_var] and output [predictionTokenEmbeddings-grad/reshape] 21:30:15.552 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/zeroslike with input [tokenEmbeddingsMatrix] and output [batchTokenEmbeddings-grad/zeroslike] 21:30:15.556 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/rank with input [tokenEmbeddingsMatrix] and output [batchTokenEmbeddings-grad/rank] 21:30:15.681 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing onehot with input [flatPredictionTokenVocabIndices] and output [onehot] 21:30:15.818 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/shape_of_1 with input [inputMasks] and output [encoder/shape_of_1] 21:30:15.820 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/shape_of_2 with input [inputMasks] and output [encoder/shape_of_2] 21:30:15.821 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/ones_as with input [inputMasks] and output [encoder/broadcastOnes_encoder] 21:30:15.848 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing gather with input [tokenEmbeddingsMatrix, inputTokenVocabIndices] and output [batchTokenEmbeddings] 21:30:15.853 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/range_1 with input [batchTokenEmbeddings-grad/sd_var_2, batchTokenEmbeddings-grad/rank, batchTokenEmbeddings-grad/sd_var_3] and output [batchTokenEmbeddings-grad/range_1] 21:30:15.863 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/strided_slice_1 with input [encoder/shape_of_1] and output [encoder/strided_slice_1] 21:30:15.867 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/strided_slice_2 with input [encoder/shape_of_2] and output [encoder/strided_slice_2] 21:30:15.870 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/expand_dims with input [encoder/broadcastOnes_encoder] and output [encoder/expand_dims] 21:30:15.870 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/shape_of with input [batchTokenEmbeddings] and output [encoder/shape_of] 21:30:15.870 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_17 with input [batchTokenEmbeddings] and output [shape_of_17] 21:30:15.872 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/listdiff with input [batchTokenEmbeddings-grad/range_1, batchTokenEmbeddings-grad/reshape] and output [batchTokenEmbeddings-grad/listdiff, batchTokenEmbeddings-grad/listdiff:1] 21:30:15.873 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/cast_1 with input [encoder/strided_slice_1] and output [encoder/cast_1] 21:30:15.873 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/cast_2 with input [encoder/strided_slice_2] and output [encoder/cast_2] 21:30:15.873 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/strided_slice with input [encoder/shape_of] and output [encoder/strided_slice] 21:30:15.873 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/concat with input [batchTokenEmbeddings-grad/reshape, batchTokenEmbeddings-grad/listdiff] and output [batchTokenEmbeddings-grad/concat] 21:30:15.876 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/stack_1 with input [encoder/cast_1, encoder/sd_var, encoder/cast_2] and output [encoder/attentionIntermediateMaskShape] 21:30:15.877 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/cast with input [encoder/strided_slice] and output [encoder/encBatchSize] 21:30:15.888 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/permute_1 with input [batchTokenEmbeddings-grad/zeroslike, batchTokenEmbeddings-grad/concat] and output [batchTokenEmbeddings-grad/permute_1] 21:30:15.900 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing batchTokenEmbeddings-grad/invert_permutation with input [batchTokenEmbeddings-grad/concat] and output [batchTokenEmbeddings-grad/invertpermutation] 21:30:15.902 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/reshape with input [inputMasks, encoder/attentionIntermediateMaskShape] and output [encoder/attentionIntermediateMask] 21:30:15.903 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/multiply with input [encoder/encBatchSize, encoder/encSequenceLength] and output [encoder/multiply] 21:30:15.903 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/self_attention_default/multiply with input [encoder/encBatchSize, encoder/encSequenceLength] and output [encoder/self_attention_default/multiply] 21:30:15.903 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/self_attention_default/stack_1 with input [encoder/encBatchSize, encoder/encSequenceLength, encoder/self_attention_default/attentionHeadsAmount, encoder/self_attention_default/attentionHeadEmbeddingSize] and output [encoder/self_attention_default/keysPerHeadAttentionShape] 21:30:15.903 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/self_attention_default/stack_2 with input [encoder/encBatchSize, encoder/encSequenceLength, encoder/self_attention_default/attentionHeadsAmount, encoder/self_attention_default/attentionHeadEmbeddingSize] and output [encoder/self_attention_default/queriesPerHeadAttentionShape] 21:30:15.904 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/stack_2 with input [encoder/encBatchSize, encoder/one] and output [encoder/stack_2] 21:30:15.909 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/multiply_1 with input [encoder/expanddims, encoder/attentionIntermediateMask] and output [encoder/intermediateSelfAttentionMasks_encoder] 21:30:15.913 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/stack with input [encoder/multiply, encoder/encHiddenSize] and output [encoder/encHiddenLayerInputShape] 21:30:15.915 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/self_attention_default/stack with input [encoder/self_attention_default/multiply, encoder/self_attention_default/multiply_1] and output [encoder/self_attention_default/attentionDotProductShape] 21:30:15.946 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/tile with input [positionalEmbeddingsMatrix, encoder/stack_2] and output [encoder/positionalEmbeddingsForAttention] 21:30:16.300 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/expand_dims_1 with input [encoder/intermediateSelfAttentionMasks_encoder] and output [encoder/selfAttentionMasks] 21:30:16.321 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/reshape_1 with input [batchTokenEmbeddings, encoder/encHiddenLayerInputShape] and output [encoder/reshape_1] 21:30:16.339 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/sub_scalar with input [encoder/selfAttentionMasks] and output [encoder/0/self_attention_default/sub_scalar] 21:30:16.345 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/sub_scalar with input [encoder/selfAttentionMasks] and output [encoder/1/self_attention_default/sub_scalar] 21:30:16.352 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/2/self_attention_default/sub_scalar with input [encoder/selfAttentionMasks] and output [encoder/2/self_attention_default/sub_scalar] 21:30:16.358 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/3/self_attention_default/sub_scalar with input [encoder/selfAttentionMasks] and output [encoder/3/self_attention_default/sub_scalar] 21:30:16.386 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/layer_norm with input [encoder/reshape_1, encoder/0/normalizedAttentionInput_embedNormGain] and output [encoder/0/normalizedAttentionInput] 21:30:16.479 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/add with input [encoder/reshape_1, encoder/positionalEmbeddingsForAttention] and output [encoder/0/keyAndQueryInput_WithPositionalEmbed] 21:30:16.539 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/mul_scalar with input [encoder/0/self_attention_default/sub_scalar] and output [encoder/0/self_attention_default/attentionMaskDisqualifier_0] 21:30:16.544 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/mul_scalar with input [encoder/1/self_attention_default/sub_scalar] and output [encoder/1/self_attention_default/attentionMaskDisqualifier_1] 21:30:16.561 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/2/self_attention_default/mul_scalar with input [encoder/2/self_attention_default/sub_scalar] and output [encoder/2/self_attention_default/attentionMaskDisqualifier_2] 21:30:16.580 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/3/self_attention_default/mul_scalar with input [encoder/3/self_attention_default/sub_scalar] and output [encoder/3/self_attention_default/attentionMaskDisqualifier_3] 21:30:16.595 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/matmul_2 with input [encoder/0/normalizedAttentionInput, encoder/0/self_attention_default/AttentionValueWeights_0] and output [encoder/0/self_attention_default/valueProjections_0] 21:30:17.185 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/layer_norm_1 with input [encoder/0/keyAndQueryInput_WithPositionalEmbed, encoder/0/keyAndQueryInputNormalized_embedNormGain] and output [encoder/0/keyAndQueryInputNormalized] 21:30:17.324 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/reshape_2 with input [encoder/0/self_attention_default/valueProjections_0, encoder/self_attention_default/keysPerHeadAttentionShape] and output [encoder/0/self_attention_default/reshape_2] 21:30:17.334 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_14 with input [encoder/0/self_attention_default/valueProjections_0] and output [shape_of_14] 21:30:17.372 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/matmul with input [encoder/0/keyAndQueryInputNormalized, encoder/0/self_attention_default/AttentionKeyWeights_0] and output [encoder/0/self_attention_default/keyProjections_0] 21:30:18.061 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/matmul_1 with input [encoder/0/keyAndQueryInputNormalized, encoder/0/self_attention_default/AttentionQueryWeights_0] and output [encoder/0/self_attention_default/queryProjections_0] 21:30:18.716 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/permute_2 with input [encoder/0/self_attention_default/reshape_2] and output [encoder/0/self_attention_default/permute_2] 21:30:18.749 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/reshape with input [encoder/0/self_attention_default/keyProjections_0, encoder/self_attention_default/keysPerHeadAttentionShape] and output [encoder/0/self_attention_default/reshape] 21:30:18.764 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_16 with input [encoder/0/self_attention_default/keyProjections_0] and output [shape_of_16] 21:30:18.827 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/reshape_1 with input [encoder/0/self_attention_default/queryProjections_0, encoder/self_attention_default/queriesPerHeadAttentionShape] and output [encoder/0/self_attention_default/reshape_1] 21:30:18.837 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_15 with input [encoder/0/self_attention_default/queryProjections_0] and output [shape_of_15] 21:30:18.885 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/permute with input [encoder/0/self_attention_default/reshape] and output [encoder/0/self_attention_default/permute] 21:30:18.907 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/permute_1 with input [encoder/0/self_attention_default/reshape_1] and output [encoder/0/self_attention_default/permute_1] 21:30:18.989 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/matmul_3 with input [encoder/0/self_attention_default/permute_1, encoder/0/self_attention_default/permute] and output [encoder/0/self_attention_default/matmul_3]

where. it seems to be busy for some while, and then it continues

21:33:36.214 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/permute_3 with input [encoder/0/self_attention_default/valuesBasedOnAttentionScores_0] and output [encoder/0/self_attention_default/permute_3] 21:33:36.248 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/reshape_3 with input [encoder/0/self_attention_default/permute_3, encoder/self_attention_default/attentionDotProductShape] and output [encoder/0/self_attention_default/attentionDotProductOutput_0] 21:33:36.258 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_13 with input [encoder/0/self_attention_default/permute_3] and output [shape_of_13] 21:33:36.343 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/self_attention_default/matmul_5 with input [encoder/0/self_attention_default/attentionDotProductOutput_0, encoder/0/self_attention_default/AttentionOutWeights_0] and output [encoder/0/self_attention_default/attentionOutput_0] 21:33:36.857 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/0/add_1 with input [encoder/reshape_1, encoder/0/self_attention_default/attentionOutput_0] and output [encoder/0/selfAttentionResidualProduct] 21:33:36.968 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/hidden_ff_layer_0/layer_norm with input [encoder/0/selfAttentionResidualProduct, encoder/hidden_ff_layer_0/hiddenLayerInputNormalized_0_embedNormGain] and output [encoder/hidden_ff_layer_0/hiddenLayerInputNormalized_0] 21:33:37.090 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/hidden_ff_layer_0/xw_plus_b with input [encoder/hidden_ff_layer_0/hiddenLayerInputNormalized_0, encoder/hidden_ff_layer_0/hiddenInnerLayerWeights_0, encoder/hidden_ff_layer_0/hiddenInnerLayerBias_0] and output [encoder/hidden_ff_layer_0/hiddenInnerLayerActivations_0] 21:33:37.793 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/hidden_ff_layer_0/gelu with input [encoder/hidden_ff_layer_0/hiddenInnerLayerActivations_0] and output [encoder/hidden_ff_layer_0/hiddenInnerLayerOutput_0] 21:33:37.850 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing _geluderivative_3 with input [encoder/hidden_ff_layer_0/hiddenInnerLayerActivations_0] and output [_geluderivative_3] 21:33:37.943 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/hidden_ff_layer_0/xw_plus_b_1 with input [encoder/hidden_ff_layer_0/hiddenInnerLayerOutput_0, encoder/hidden_ff_layer_0/hiddenOutLayerWeights_0, encoder/hidden_ff_layer_0/hiddenOutLayerBias_0] and output [encoder/hidden_ff_layer_0/hiddenLayerFinalOutputNormalized_0] 21:33:38.748 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/hidden_ff_layer_0/add with input [encoder/0/selfAttentionResidualProduct, encoder/hidden_ff_layer_0/hiddenLayerFinalOutputNormalized_0] and output [encoder/hidden_ff_layer_0/hiddenLayerResidualProductNormalized_0] 21:33:38.802 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/layer_norm with input [encoder/hidden_ff_layer_0/hiddenLayerResidualProductNormalized_0, encoder/1/normalizedAttentionInput_embedNormGain] and output [encoder/1/normalizedAttentionInput] 21:33:38.946 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/add with input [encoder/hidden_ff_layer_0/hiddenLayerResidualProductNormalized_0, encoder/positionalEmbeddingsForAttention] and output [encoder/1/keyAndQueryInput_WithPositionalEmbed] 21:33:39.051 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/matmul_2 with input [encoder/1/normalizedAttentionInput, encoder/1/self_attention_default/AttentionValueWeights_1] and output [encoder/1/self_attention_default/valueProjections_1] 21:33:39.718 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/layer_norm_1 with input [encoder/1/keyAndQueryInput_WithPositionalEmbed, encoder/1/keyAndQueryInputNormalized_embedNormGain] and output [encoder/1/keyAndQueryInputNormalized] 21:33:39.851 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/reshape_2 with input [encoder/1/self_attention_default/valueProjections_1, encoder/self_attention_default/keysPerHeadAttentionShape] and output [encoder/1/self_attention_default/reshape_2] 21:33:39.862 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_10 with input [encoder/1/self_attention_default/valueProjections_1] and output [shape_of_10] 21:33:39.898 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/matmul with input [encoder/1/keyAndQueryInputNormalized, encoder/1/self_attention_default/AttentionKeyWeights_1] and output [encoder/1/self_attention_default/keyProjections_1] 21:33:40.474 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/matmul_1 with input [encoder/1/keyAndQueryInputNormalized, encoder/1/self_attention_default/AttentionQueryWeights_1] and output [encoder/1/self_attention_default/queryProjections_1] 21:33:41.027 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/permute_2 with input [encoder/1/self_attention_default/reshape_2] and output [encoder/1/self_attention_default/permute_2] 21:33:41.065 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/reshape with input [encoder/1/self_attention_default/keyProjections_1, encoder/self_attention_default/keysPerHeadAttentionShape] and output [encoder/1/self_attention_default/reshape] 21:33:41.165 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_12 with input [encoder/1/self_attention_default/keyProjections_1] and output [shape_of_12] 21:33:41.184 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/reshape_1 with input [encoder/1/self_attention_default/queryProjections_1, encoder/self_attention_default/queriesPerHeadAttentionShape] and output [encoder/1/self_attention_default/reshape_1] 21:33:41.222 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing shape_of_11 with input [encoder/1/self_attention_default/queryProjections_1] and output [shape_of_11] 21:33:41.255 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/permute with input [encoder/1/self_attention_default/reshape] and output [encoder/1/self_attention_default/permute] 21:33:41.288 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/permute_1 with input [encoder/1/self_attention_default/reshape_1] and output [encoder/1/self_attention_default/permute_1] 21:33:41.372 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/matmul_3 with input [encoder/1/self_attention_default/permute_1, encoder/1/self_attention_default/permute] and output [encoder/1/self_attention_default/matmul_3]

where it also seems to be busy for some time and then it continues

21:35:24.278 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/mul_scalar_1 with input [encoder/1/self_attention_default/matmul_3] and output [encoder/1/self_attention_default/attentionScoresBeforeMasking_1] 21:35:24.388 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/add with input [encoder/1/self_attention_default/attentionScoresBeforeMasking_1, encoder/1/self_attention_default/attentionMaskDisqualifier_1] and output [encoder/1/self_attention_default/attentionWeightsMasked_1] 21:35:24.467 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/softmax with input [encoder/1/self_attention_default/attentionWeightsMasked_1] and output [encoder/1/self_attention_default/attentionSoftmaxScores_1] 21:35:24.529 [main] INFO o.t.c.n.m.t.m.MlmTransformerSdModel - Executing encoder/1/self_attention_default/matmul_4 with input [encoder/1/self_attention_default/attentionSoftmaxScores_1, encoder/1/self_attention_default/permute_2] and output [encoder/1/self_attention_default/valuesBasedOnAttentionScores_1]

So it really seems to do the traning, it is just slow :-)

I guess to make it faster I just need to change the "dimensions of the model", etc. right?

I understand, that the model might not work well with smaller dimensions, but just to see whether the setup is correct and one is able to get a result after some reasonable time.

partarstu commented 1 year ago

So it really seems to do the training, it is just slow :-)

It's actually shocking - 3 minutes for a matrix multiply is super slow - I think there's some issue with a platform implementation. Normally one iteration should take no more than a minute. In your case only one matrix multiply takes 3 times more and the iteration itself, which in its turn has a lot more operations than that one. I think you should try either macosx-x86_64-avx2 or macosx-x86_64-avx512 as alternative options and see if it helps. If it doesn't - there's a forum where you could get the answer from the developers of DL4J which relates to your specific case : https://community.konduit.ai/c/nd4j/13

partarstu commented 1 year ago

I guess to make it faster I just need to change the "dimensions of the model", etc. right?

Nope - we need to find out what's wrong with you setup. I mean you could decrease the dimensions, but it doesn't resolve the real problem you have. And this problem is platoform-bound and it should be resolved anyway.

michaelwechner commented 1 year ago

Hi Taras, thanks for your analysis! I will try the other options as you suggest and let you know :-)

partarstu commented 10 months ago

@michaelwechner , are there any other comments from your side? Could this issue be closed ?

partarstu commented 9 months ago

Closing the issue due to the significant inactivity period.