treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data
https://docs.lakefs.io
Apache License 2.0
4.46k stars 360 forks source link

GC: Handle no GC rules case #3772

Closed talSofer closed 2 years ago

talSofer commented 2 years ago

When no GC rules are defined for a repository, running GC against it throws an exception that's not handled instead of exit gracefully.

Exception in thread "main" io.lakefs.clients.api.ApiException: Not Found
    at io.lakefs.clients.api.ApiClient.handleResponse(ApiClient.java:1029)
    at io.lakefs.clients.api.ApiClient.execute(ApiClient.java:942)
    at io.lakefs.clients.api.RetentionApi.getGarbageCollectionRulesWithHttpInfo(RetentionApi.java:158)
    at io.lakefs.clients.api.RetentionApi.getGarbageCollectionRules(RetentionApi.java:136)
    at io.treeverse.clients.ApiClient.getGarbageCollectionRules(ApiClient.scala:106)
    at io.treeverse.clients.GarbageCollector$.main(GarbageCollector.scala:331)
    at io.treeverse.clients.GarbageCollector.main(GarbageCollector.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Steps to reproduce

  1. create a lakeFS repository without GC rules
  2. run GC job for that repo
arielshaqed commented 2 years ago

I would be (very) happy if the behaviour remained a failure. (For that matter, the stacktrace is pretty informative, but it could obviously be more informative!)

I think that this is not a dupe of #3169, but would like to make sure.

talSofer commented 2 years ago

Yes agreed that GC should fail, but a planned failure, like making main exit. WDYT? And you are right, this is not a dup.