tusharchou commented 2 months ago

Writing pytest for pyiceberg=0.8.1

In the next release of iceberg-python library the following issues will be resolved:

[ ] https://github.com/apache/iceberg-python/issues/1223

The python-iceberg repository has released 0.7.1 version on Aug 20 2024.

0.7.1 Latest Feature Case Study

Fix delete to trace existing manifests when a data file is partially rewritten

so even when we are rewriting the data partially, we still need to add the new manifestentries as "existing" entries in order to track the new data files that are re-written. these files are unaffected by the delete and should be kept in the manifest as an existing entry.

pytest: tests/intergration/test_writes/test_writes.py

[ ] test_delete_threshold()
[ ] load minio catalog
[ ] create schema
[ ] partition specification
[ ] clean environment for testing
[ ] exception handling
[ ] create table
[ ] generate test data
[ ] design test
Source Issue

Let's try it out and understand root cause of this issue

tusharchou commented 1 month ago

How to contribute

Hi iceberg community,

Then goal of this issue is to create a sandbox platform for open source enthusiasts to learn how to contribute to apache projects like python-iceberg. We get to learn new libraries and share that learning with the community.

Data Lake House format have a huge impact on cloud cost and understanding optimization are very important to scale at production.

I believe if we use a real world use case to break down the problem it will become easy to solve.

Explain the problem better

Who is facing the problem?

The python developer who is facing this problem is probably working for some data product company on a production environment.

What is the problem?

Interacting with Iceberg tables programmatically using Python

When does the problem occur?

Accessing the Iceberg table while a Spark Job is updating the underlying Table.

Where does the user encounter the problem ?

To replicate the cloud on local we can use tabular spark docker container

Why is the problem existing?

Iceberg tables being managed by python makes it very friendly

tusharchou commented 2 weeks ago

Write a pytest for this feature request

iceberg-python Count rows as a metadata-only operation

tusharchou commented 1 week ago

@rakhioza07

tusharchou / local-data-platform

0.1.2 Testing pyiceberg 0.8.1 feature requests #1