Failing test with datomic-cloud

MageMasher commented 6 years ago

I was able to reproduce the failure with the below command, running it once does not always cause a failure. Also, running only lein test :only onyx.plugin.tx-async-output-test/datomic-tx-output-test does NOT reproduce the failure.

for run in {1..10}
do
lein with-profile cloud test :cloud
done

Here is an output containing the error:


lein test onyx.plugin.input-datoms-components-test
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/jlane/.m2/repository/com/fzakaria/slf4j-timbre/0.3.8/slf4j-timbre-0.3.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/jlane/.m2/repository/org/slf4j/slf4j-nop/1.7.12/slf4j-nop-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [com.github.fzakaria.slf4j.timbre.TimbreLoggerFactory]
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.eclipse.jetty.util.BufferUtil (file:/Users/jlane/.m2/repository/org/eclipse/jetty/jetty-util/9.3.7.v20160115/jetty-util-9.3.7.v20160115.jar) to field java.nio.MappedByteBuffer.fd
WARNING: Please consider reporting this to the maintainers of org.eclipse.jetty.util.BufferUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Starting Onyx test environment
Stopping Onyx test environment

lein test onyx.plugin.input-index-range-test
Starting Onyx test environment
Stopping Onyx test environment

lein test onyx.plugin.input-log-kill-test
Starting Onyx test environment
Stopping Onyx test environment

lein test onyx.plugin.input-log-test
Starting Onyx test environment
Stopping Onyx test environment

lein test onyx.plugin.input-test
Starting Onyx test environment
Stopping Onyx test environment

lein test onyx.plugin.output-test
Starting Onyx test environment
Stopping Onyx test environment

lein test onyx.plugin.tx-async-output-test
Starting Onyx test environment

lein test :only onyx.plugin.tx-async-output-test/datomic-tx-output-test

FAIL in (datomic-tx-output-test) (tx_async_output_test.clj:95)
expected: (= (set (map (comp (juxt :name :age :uuid) (partial d/entity db)) (apply concat (d/q (quote [:find ?e :where [?e :name]]) db)))) #{["Derek" nil nil] ["Dorrene" nil nil] ["Benti" 18 nil] ["Kristen" nil nil] ["Mike" 30 #uuid "f47ac10b-58cc-4372-a567-0e02b2c3d479"]})
  actual: (not (= #{["Derek" nil nil] ["Dorrene" nil nil] ["Benti" 18 nil] ["Mike" 27 #uuid "f47ac10b-58cc-4372-a567-0e02b2c3d479"] ["Kristen" nil nil]} #{["Derek" nil nil] ["Dorrene" nil nil] ["Benti"18 nil] ["Kristen" nil nil] ["Mike" 30 #uuid "f47ac10b-58cc-4372-a567-0e02b2c3d479"]}))
Stopping Onyx test environment

lein test onyx.plugin.tx-output-test
Starting Onyx test environment
Stopping Onyx test environment

Ran 8 tests containing 9 assertions.
1 failures, 0 errors.
Tests failed.
Error encountered performing task 'test' with profile(s): 'base,system,user,provided,dev,datomic-cloud'
Tests failed.

k2n commented 6 years ago

Thank you for reporting. I'll look into this in this weekend.

MageMasher commented 6 years ago

Whats interesting is that this may be related to the dynamodb throughput with the solo-topology. If you set up a solo-topology, look at the cloudwatch alarms, you may find something there.

k2n commented 6 years ago

I created https://github.com/signifier-jp/onyx-datomic-cloud-ci to run the tests triggered by commits. It runs as a stand-alone or a docker container. As a first step, I ran it from my local via socks proxy, and reproduced the error on 2nd round. I also confirmed the following alert.


State Details:The current state of the alarm: OK, ALARM or INSUFFICIENT_DATA. Includes information about when it entered this state and why.State changed to ALARM at 2018/04/12. Reason: Threshold Crossed: 7 datapoints were less than the threshold (150.0). The most recent datapoints which crossed the threshold: [23.0 (12/04/18 04:21:00), 16.0 (12/04/18 04:19:00), 1.0 (12/04/18 04:18:00), 18.0 (12/04/18 04:17:00), 36.0 (12/04/18 04:16:00)].
--
Description:The description provided when the alarm was created or modified.DO NOT EDIT OR DELETE. For TargetTrackingScaling policy arn:aws:autoscaling:us-west-2:NNNNNNNNNNN:scalingPolicy:af7b460a-4967-45ee-a01a-d746352bdbc4:resource/dynamodb/table/datomic-signifier-dev:policyName/datomic-signifier-dev-write-scaling-policy.
Threshold:The condition in which the alarm will go to the ALARM state.ConsumedWriteCapacityUnits < 150 for 15 datapoints within 15 minutes
Actions:The actions that will occur when the alarm changes state.In ALARM:arn:aws:autoscaling:us-west-2:NNNNNNNNNNNN:scalingPolicy:af7b460a-4967-45ee-a01a-d746352bdbc4:resource/dynamodb/table/datomic-signifier-dev:policyName/datomic-signifier-dev-write-scaling-policy | In ALARM: | arn:aws:autoscaling:us-west-2:NNNNNNNNNN:scalingPolicy:af7b460a-4967-45ee-a01a-d746352bdbc4:resource/dynamodb/table/datomic-signifier-dev:policyName/datomic-signifier-dev-write-scaling-policy
In ALARM: | arn:aws:autoscaling:us-west-2:NNNNNNNNNNN:scalingPolicy:af7b460a-4967-45ee-a01a-d746352bdbc4:resource/dynamodb/table/datomic-signifier-dev:policyName/datomic-signifier-dev-write-scaling-policy
Namespace:A namespace is a conceptual element to which you attach one or more metrics.AWS/DynamoDB
Metric Name:The metric name field of the metric being monitored.ConsumedWriteCapacityUnits
Dimensions:Name/Value pairs used to identify a metric uniquely. Often indicates the resource being monitored.TableName = datomic-signifier-dev
Statistic:Statistic of the metric being monitored, either Average, Minimum, Maximum, Sum or number of Data SamplesSum
Period:The granularity of the datapoints for the monitored metric.1 minute
Treat missing data as:This option will be applied when the metric data is missing for alarm evaluation.missing
Percentiles with low samples:This option will be applied when the percentile has low data samples for alarm evaluation.evaluate

I tried to bump up read-capacity 250, write-capacity 125, but the test still failed. Interestingly, the write capacity graph in metrics tab doesn't show any high usage while I received the error via CloudWatch. dynamodb_ _aws_console

I will try different configurations such as running the test in the peered VPC so that I can connect without socks proxy, etc.

onyx-platform / onyx-datomic

Failing test with datomic-cloud #32