Open tusharchou opened 2 months ago
Hi iceberg community,
Then goal of this issue is to create a sandbox platform for open source enthusiasts to learn how to contribute to apache projects like python-iceberg. We get to learn new libraries and share that learning with the community.
Data Lake House format have a huge impact on cloud cost and understanding optimization are very important to scale at production.
I believe if we use a real world use case to break down the problem it will become easy to solve.
The python developer who is facing this problem is probably working for some data product company on a production environment.
Interacting with Iceberg tables programmatically using Python
Accessing the Iceberg table while a Spark Job is updating the underlying Table.
To replicate the cloud on local we can use tabular spark docker container
Iceberg tables being managed by python makes it very friendly
@rakhioza07
Writing pytest for pyiceberg=0.8.1
In the next release of iceberg-python library the following issues will be resolved:
The python-iceberg repository has released 0.7.1 version on Aug 20 2024.
0.7.1 Latest Feature Case Study
Fix delete to trace existing manifests when a data file is partially rewritten
so even when we are rewriting the data partially, we still need to add the new manifestentries as "existing" entries in order to track the new data files that are re-written. these files are unaffected by the delete and should be kept in the manifest as an existing entry.
pytest: tests/intergration/test_writes/test_writes.py
[ ] test_delete_threshold()
[ ] load minio catalog
[ ] create schema
[ ] partition specification
[ ] clean environment for testing
[ ] exception handling
[ ] create table
[ ] generate test data
[ ] design test
Source Issue
Let's try it out and understand root cause of this issue