Description
The library attempts to use the Unix-specific 'fcntl' module which is not available on Windows systems.
To Reproduce
Steps to reproduce:
Install data-prep-toolkit-transforms on Windows system
Try to import and use the Pdf2ParquetTransform:
from data_processing_ray.runtime.ray import RayTransformLauncher
from pdf2parquet_transform import (
pdf2parquet_contents_type_cli_param,
pdf2parquet_contents_types,
)
Get ModuleNotFoundError Error message: ModuleNotFoundError: No module named 'fcntl'
Expected behavior
The library should either:
Use Windows-compatible alternatives (like msvcrt) for file locking on Windows systems
Gracefully handle the absence of fcntl on Windows
Clearly document Windows compatibility limitations
Desktop
OS: Windows 11
Python Version: 3.11.9
data-prep-toolkit-transforms Version: 0.2.2.dev2
Additional context
The specific part causing the issue is in the Pdf2ParquetTransform class where it attempts to use MultiLock for file synchronization operations. This functionality relies on the Unix-specific fcntl module.
I'm currently implementing a workaround using WSL, but it would be beneficial to have native Windows support.
Description The library attempts to use the Unix-specific 'fcntl' module which is not available on Windows systems.
To Reproduce Steps to reproduce:
Error message: ModuleNotFoundError: No module named 'fcntl'
Expected behavior The library should either:
Desktop
Additional context The specific part causing the issue is in the Pdf2ParquetTransform class where it attempts to use MultiLock for file synchronization operations. This functionality relies on the Unix-specific fcntl module. I'm currently implementing a workaround using WSL, but it would be beneficial to have native Windows support.