Closed domnikl closed 2 months ago
Thanks @domnikl for the PR. I think it in general looks great and complete!
Also I want to look into fetching results more efficiently and not everything at once.
Yes, I think this might be an issue in terms of performance. I don't have experience of using prusto before, but it seems like the get
and get_next
APIs are working for gradually fetching the results. Not sure whether it is easy to switch from get_all
to these two.
time columns not being mapped correctly
I'm not sure what exactly the bug is. Currently we are converting the NaiveTime
type in rust to String
type in pandas in general (and also I see it in your pr), it is done by the function here. If you want to customize the conversion or mapping it to another type in pandas, it can be done by modifying the mapping in the macro as well as this function.
It would be great to also submit the corresponding seed database script here so others can run the python test_trino locally.
@wangxiaoying thanks for looking into it!
I'm not sure what exactly the bug is
The problem is with reading timestamps with time zone types, prusto does not support it (yet) by the looks of it. Instead it returns EmptyData
and it appears the result is just empty. But its not that important anyway I think to get starting. I'll create an issue in their project.
It would be great to also submit the corresponding seed database script
Will do in the next days and also rework fetching from the database with the get
/get_next
calls if possible.
@wangxiaoying I implemented the missing partitioning as well as switched to get/get_next for prusto and fixed the tests and added test data and squashed some bugs along the way. Could you have a look at it again please?
Hi @domnikl , thanks so much for completing the PR!
I've tested the test cases locally and it seems works well. I have two minor comments above but in general I think the code looks good and complete. We probably also need to add a documentation page for trino here.
I think after fixing the above minor issues, we can merge it and make it into our next release!
@wangxiaoying thanks for having a look! Added the missing pieces and a bit of documentation for it.
I'd really like to see Trino being implemented as a source in ConnectorX and thus have begun working on it. It uses prusto as a client and currently supports all basic types. Support for tuple, array, row and uuid isn't implemented yet and there's a bug with time columns not being mapped correctly. Also I want to look into fetching results more efficiently and not everything at once.
What do you think, is it worth continuing work on it? Is there anything else I need to consider eg. regarding mapping the results?