googlegenomics / dataflow-java

Google Cloud Dataflow pipelines such as Identity-By-State as well as useful utility classes.
Apache License 2.0
36 stars 31 forks source link

Bug fix for range representation of Variants in AnnotateVariants.java. #191

Closed cmclean closed 8 years ago

cmclean commented 8 years ago

Per the variants.proto Variant message, start is zero-based inclusive and end is zero-based exclusive.

deflaux commented 8 years ago

Code change looks good. Does the integration test pass? Travis CI currently only runs unit tests.

 mvn -Dit.test=AnnotateVariantsITCase#testLocal verify
cmclean commented 8 years ago

Hmm I believe so. I ran the command above which initially failed because the required environment variables were not set. After setting them to my personal project I have success with the logs below. However there are no files in the expected location in my cloud bucket. Is that expected?

...

T E S T S

Running com.google.cloud.genomics.dataflow.pipelines.AnnotateVariantsITCase SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [Ljava.lang.String;@305b7c14 May 12, 2016 11:17:36 AM com.google.cloud.genomics.dataflow.pipelines.AnnotateVariants processElement INFO: processing contig variant_set_id: "3049512673186936334" call_set_ids: "3049512673186936334-0" reference_name: "chr17" start: 40700000 end: 40800000

May 12, 2016 11:17:36 AM io.grpc.internal.ManagedChannelImpl INFO: [ManagedChannelImpl@475835b1] Created with target genomics.googleapis.com:443 May 12, 2016 11:17:38 AM com.google.cloud.genomics.dataflow.pipelines.AnnotateVariants retrieveTranscripts INFO: read 7 transcripts in 269.8 ms (Infinity / s) May 12, 2016 11:17:38 AM com.google.cloud.genomics.dataflow.pipelines.AnnotateVariants retrieveVariantAnnotations INFO: read 8 variant annotations in 394.6 ms (Infinity / s) May 12, 2016 11:17:39 AM com.google.cloud.genomics.dataflow.pipelines.AnnotateVariants processElement INFO: finished reading 212 variants in 940.9 ms May 12, 2016 11:17:39 AM io.grpc.internal.ManagedChannelImpl maybeTerminateChannel INFO: [ManagedChannelImpl@475835b1] Terminated Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.207 sec - in com.google.cloud.genomics.dataflow.pipelines.AnnotateVariantsITCase

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent! The file encoding for reports output files should be provided by the POM property ${project.reporting.outputEncoding}. [INFO] [INFO] --- maven-failsafe-plugin:2.18.1:verify (default) @ google-genomics-dataflow --- [INFO] Failsafe report directory: /usr/local/google/home/cym/code/dataflow-java/target/failsafe-reports [WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent! The file encoding for reports output files should be provided by the POM property ${project.reporting.outputEncoding}. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 30.183 s [INFO] Finished at: 2016-05-12T11:17:40-07:00 [INFO] Final Memory: 29M/972M [INFO] ------------------------------------------------------------------------

deflaux commented 8 years ago

That's right. The test cleans up after itself https://github.com/googlegenomics/dataflow-java/blob/master/src/test/java/com/google/cloud/genomics/dataflow/pipelines/AnnotateVariantsITCase.java#L73