google / yggdrasil-decision-forests

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
https://ydf.readthedocs.io/
Apache License 2.0
498 stars 53 forks source link

Go module buried in this repo prevents the module importing properly #53

Closed jimidle closed 1 year ago

jimidle commented 1 year ago

I had the same issues when I kept the Go runtime for ANTLR in a buried subdirectory in the repo. This prevents the go get command from dealing nicely with the package as:

The go get will download the whole repo including all the non-go stuff in to the package cache

If you tag the repo with a release tag, the go get will not resolve it and will make up a tag name and not use the release tag. So tagging at say 1.0.0 will not show 1.0.0 in the go.mod file for a project using the module. It will show something like:

require (
    github.com/google/yggdrasil-decision-forests/yggdrasil_decision_forests/port/go v0.0.0-20230710100126-8f25eced9d1b
)

Which means it is impossible to know at a glance which version of the code is being used. Also, the manufactured tag will change if some other part of the repo is updated.

The go language really expects that the code is in its own repo, with the code at the top level, with the go.mod there too. That was the only way I could make everything work for the go tooling.

rstz commented 1 year ago

Thank you for reporting - this is unfortunate. I'm very hesitant to open a new repo for the Go API (this would create a lot of overhead on our side), so if anyone knows a solution to this, please let us know.

janpfeifer commented 1 year ago

Indeed the fact that the whole repository is retrieved for only the Go API, which is probably ~2% of the whole thing, slows importing down -- even though it's done only the first time, in this world of dockers and other containers that always start from a clean build, this can be an issue.

About the go get versions, could this be related to the fact that YDF doesn't use the vX.X.X tags (I think more common), and instead uses X.X.X tags, without the preceding v ? I always wondered if this made any difference in resolution.

The pkg.go.dev is definitely not parsing our version tags without the preceding v.

rstz commented 1 year ago

Oh, wow, I did not expect this to have any effect, but seems like you're right. That's pretty annoying - will change it for the next release!

janpfeifer commented 1 year ago

Yes, I noticed this also with:

$ go list -m -versions github.com/google/yggdrasil-decision-forests/yggdrasil_decision_forests/port/go

Which doesn't return any recognized versions the Go API. Sadly this is not clearly documented in the Go documentation :(

One potential solution would be including both forms of tags. Go accepts non-release tags as well (meaning it doesn't require creating a release with the associate tag), I just tested. Wdyt ?

rstz commented 1 year ago

Sounds good, just created Tag v1.5.0 https://github.com/google/yggdrasil-decision-forests/releases/tag/v1.5.0 Let me know if this improves things

janpfeifer commented 1 year ago

Thanks @rstz . Unfortunately it didn't seem to solve:

$ GOPROXY=direct go list -m -versions github.com/google/yggdrasil-decision-forests/yggdrasil_decision_forests/port/go

Still returns empty 😞 ... I assume then this is related to the go.mod being in a sub-directory -- since on another project that I tested simply creating the tag worked.

More googling around and I found this: https://github.com/golang/go/issues/34055

And then the documentation about it in here: https://go.dev/ref/mod#vcs-version

Which suggests that the tag for a sub-directory module should be prefixed with the subdirectory. So one should create a tag yggdrasil_decision_forests/port/go/v1.5.0

Now something else we could do is a redirect from github.com/google/yggdrasil-decision-forests to https://github.com/google/yggdrasil-decision-forests/yggdrasil-decision-forests/port/go for the go-import, which would simplify, and possibly fix the issue of the versioning as well.

I'll test a bit here and report back.

janpfeifer commented 1 year ago

So 3 options:

  1. Create a new repository: impractical I would think.
  2. Create a form of subdirectory redirect, described and accepted in golang/go#34055. Not yet implemented though (I understand, I pinged the thread), and since it would also require changes in Github to allow configurable changes to the <meta> tag, I don't see this happening soon.
  3. Move the go.mod to the root directory -- and adjust its module root name accordingly. It's just one file, and seems to have been the option chosen by some other "polyglot" projects like ours -- e.g.: github.com/apache/thrift.

Wdyt ?

rstz commented 1 year ago

I believe creating a special tag for the subdirectory solved it? Now I get

$ GOPROXY=direct go list -m -versions github.com/google/yggdrasil-decision-forests/yggdrasil_decision_forests/port/go
github.com/google/yggdrasil-decision-forests/yggdrasil_decision_forests/port/go v1.5.0

Is that all we need?

janpfeifer commented 1 year ago

Oh yes, that solves it as well. I should have listed it -- with the downside of the long tag ... but maybe not an issue.

Thanks Richard!!

rstz commented 1 year ago

Cool, then I think we'll use these tags from now on, since this solution is the least invasive for our repo. Thank you for digging into this :)

jimidle commented 1 year ago

Thank you for reporting - this is unfortunate. I'm very hesitant to open a new repo for the Go API (this would create a lot of overhead on our side), so if anyone knows a solution to this, please let us know.

Indeed I searched for a solution for months and tried everything including tag formats etc. It is just that go expects to be the only code in the repo I think.

For ANTLR I just created a repo that mirrors the release code. But that is still overhead, when I am the only maintainer. I can live with it as it is. Others may have problems with CI and verification and so on I guess.

Ironically, the only group that did not have issues with this was Google, who don't use Go modules despite the rest of us doping so ;)

jimidle commented 1 year ago

creating a special tag for the subdirectory

I tried this with ANTLR as well. You still get the entire source code in the download. It is an improvement, and you can get the right tag in go.mod but if you move to a /v2 with a v2 tag, then it falls apart again.

I suppose it is fine so long as you keep the code at v1 ;)

janpfeifer commented 1 year ago

Indeed, it doesn't solve the downloading problem.

Actually that's is something that could be brought up with the Go community: it should have good support for "polyglot" projects, like YDF.

And the /v2, we'll get there when the v2 comes. Hopefully by then there will be better options.

jimidle commented 1 year ago

Good enough Jan. I did bring this up before with the Go team. The go get sequence is very involved.

Thanks for your efforts.

On Wed, Jul 26, 2023 at 16:14 Jan @.***> wrote:

Indeed, it doesn't solve the downloading problem.

Actually that's is something that could be brought up with the Go community: it should have good support for "polyglot" projects, like YDF.

And the /v2, we'll get there when the v2 comes. Hopefully by then there will be better options.

— Reply to this email directly, view it on GitHub https://github.com/google/yggdrasil-decision-forests/issues/53#issuecomment-1651199412, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ7TMGG2GXAJFBBGSHP2S3XSDGV7ANCNFSM6AAAAAA2SNXOJE . You are receiving this because you authored the thread.Message ID: @.***>