marcboeker / go-duckdb

go-duckdb provides a database/sql driver for the DuckDB database engine.
MIT License
646 stars 97 forks source link

Upgrade duckdb to v0.9.0 #121

Closed k-anshul closed 11 months ago

ajzo90 commented 11 months ago

Where is the extended amalgamation defined? I'm trying to build locally but without success so far.

k-anshul commented 11 months ago

Where is the extended amalgamation defined? I'm trying to build locally but without success so far.

The extended amalgamation wasn't required previously. I have asked this question on duckdb discord channel as well. For now I created sources locally from duckdb repo.

ajzo90 commented 11 months ago

I added an empty additional case in statement.go to avoid panics (see https://github.com/ajzo90/go-duckdb/commit/d73949d93b575ac8aa5d6eaacf15d9e4f6aa2db0)

k-anshul commented 11 months ago

I added an empty additional case in statement.go to avoid panics (see ajzo90@d73949d)

There is also a new API that we can use : https://github.com/k-anshul/go-duckdb/blob/76b2b36bfdc7bcd1d8a2a4c1813ec1e52b984d9e/statement.go#L220

k-anshul commented 11 months ago

The amalgamated duckdb source no longer works and gives compilation error so used locally generated source with python3 scripts/amalgamation.py --extended

marcboeker commented 11 months ago

@k-anshul Thanks for the PR. I'm currently looking into it, but it seems that Github Actions times out when building the linux (amd64 and arm64) libs. We had this in the past that build on linux machines were killed because of reaching a timeout. I need to investigate this further.

I've also changed the Makefile build the amalgamation file from the current git release branch.

git clone -b v${DUCKDB_VERSION} --depth 1 https://github.com/duckdb/duckdb.git
cd duckdb/ && python3 scripts/amalgamation.py --extended && cd ..
echo '#ifdef GODUCKDB_FROM_SOURCE' > duckdb.hpp.tmp; cat duckdb/src/amalgamation/duckdb.hpp >> duckdb.hpp.tmp; echo '\n#endif' >> duckdb.hpp.tmp; mv duckdb.hpp.tmp duckdb.hpp
echo '#ifdef GODUCKDB_FROM_SOURCE' > duckdb.cpp.tmp; cat duckdb/src/amalgamation/duckdb.cpp >> duckdb.cpp.tmp; echo '\n#endif' >> duckdb.cpp.tmp; mv duckdb.cpp.tmp duckdb.cpp
cp duckdb/src/include/duckdb.h duckdb.h
rm -rfv duckdb
k-anshul commented 11 months ago

Thanks @marcboeker I actually discussed the amalgamated sources based approach with duckdb team and they suggested to move away from this approach due to memory requirements as well : https://discord.com/channels/909674491309850675/921100573732909107/1156522175495413821 I was planning to raise an issue for this but missed it. I think we should move away from our current approach and use the approach that other drivers are using. I had also built the linux images on a self hosted VM with large memory.

marcboeker commented 11 months ago

When splitting the amalgamation into multiple files using --split=10, it partly compiles on the Github runners. Due to a missing or overwritten definition, one of the 10 chunks has compilation errors. Now I'm trying to use the provided libduckdb_static.a that is part of the normal DuckDB build. But unfortunately this is missing the third party libs. I'm currently trying to figure out how to include them. See the discussion in discord https://discord.com/channels/909674491309850675/921100573732909107/1160353844404949072 Until then we either have to wait, precompile v0.9.0 locally and push the static libs or upgrade to a bigger runner.

marcboeker commented 11 months ago

@k-anshul OK, I think I have managed to get a 0.9.0 build that works without the amalgamation. Could you please checkout this branch and test if it works for you? Once we have a stable build we can continue with implementing the missing states for the pending API.

k-anshul commented 11 months ago

Once we have a stable build we can continue with implementing the missing states for the pending API.

Hey @marcboeker
We see intermittent panics without the state API changes. Could you please also review and add the pending APIs changes from this PR.

marcboeker commented 11 months ago

Once we have a stable build we can continue with implementing the missing states for the pending API.

Hey @marcboeker We see intermittent panics without the state API changes. Could you please also review and add the pending APIs changes from this PR.

@k-anshul Your changes are already merged into my branch. The panics you're encountering could be from #124. The DUCKDB_PENDING_NO_TASKS_AVAILABLE state seems to be missing. I'm adding this right now. Do you have a query on hand to reproduce the panic so that I can test it.

k-anshul commented 11 months ago

Ideally the API added by duckdb duckdb_pending_execution_is_finished should handle all the states. So this if block here should handle all the cases. Unfortunately I don't have the queries ready, I saw it occasionally while testing with our application the last time without these changes. Haven't seen any panics so far.

k-anshul commented 11 months ago

@k-anshul OK, I think I have managed to get a 0.9.0 build that works without the amalgamation. Could you please checkout this branch and test if it works for you? Once we have a stable build we can continue with implementing the missing states for the pending API.

Hi @marcboeker I did a high level experience pass on your changes and it works fine with our application.

k-anshul commented 11 months ago

@k-anshul OK, I think I have managed to get a 0.9.0 build that works without the amalgamation. Could you please checkout this branch and test if it works for you? Once we have a stable build we can continue with implementing the missing states for the pending API.

Also can we merge your branch to main. Since v0.9.1 is out we can use these changes to upgrade driver.

marcboeker commented 11 months ago

I'm have just added the -O3 flags to optimize the builds. If they compile successfully, we should do a small code review to check, if the new build process does not introduce problems on all the different platforms. When everything looks fine, we can merge this branch and directly upgrade to 0.9.1.

marcboeker commented 11 months ago

Okay, my changes and cleanups are now pushed. Do you have the possibility to verify that it works on linux (amd64 and arm64)? And it would be great to hear @begelundmuller feedback on this as he was the original author of the amalgamation approach. One thing we should also make sure is to check if all the extensions still work. JSON and Parquet (I've added a Parquet test case) seem OK. Once we are confident, we can merge this.

k-anshul commented 11 months ago

Okay, my changes and cleanups are now pushed. Do you have the possibility to verify that it works on linux (amd64 and arm64)? And it would be great to hear @begelundmuller feedback on this as he was the original author of the amalgamation approach. One thing we should also make sure is to check if all the extensions still work. JSON and Parquet (I've added a Parquet test case) seem OK. Once we are confident, we can merge this.

Sure. I took your changes and upgraded it to 0.9.1 in my fork as well. Will test both changes and update you. Thanks a lot.

begelundmuller commented 11 months ago

Hey @marcboeker, I took a look at the changes and overall they seem good (with the caveat that I don't know that much about C++ builds).

One thing – now that the .cpp file has been removed from the repository, we should remove the cgo_source.go file and no longer document duckdb_from_source in the README.

marcboeker commented 11 months ago

One thing – now that the .cpp file has been removed from the repository, we should remove the cgo_source.go file and no longer document duckdb_from_source in the README.

Thanks @begelundmuller, I've removed the duckdb_from_source case.

marcboeker commented 11 months ago

@k-anshul If your tests did not produce any errors, I would suggest to merge this branch where all workflows have been run and all static libs were compiled and committed. Or you can start your the workflows on your branch again and we merge this one. Thanks for your support!

k-anshul commented 11 months ago

Yeah haven't found any issues so far, I think we can merge. We can merge your branch and close this PR.