sfu-db / dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
http://dataprep.ai
MIT License
2.07k stars 206 forks source link

Windows compatibility issue in data-prep-toolkit-transforms #995

Closed Boris-Chernetsov closed 1 week ago

Boris-Chernetsov commented 1 week ago

Description The library attempts to use the Unix-specific 'fcntl' module which is not available on Windows systems.

To Reproduce Steps to reproduce:

  1. Install data-prep-toolkit-transforms on Windows system
  2. Try to import and use the Pdf2ParquetTransform:
    from data_processing_ray.runtime.ray import RayTransformLauncher
    from pdf2parquet_transform import (
    pdf2parquet_contents_type_cli_param, 
    pdf2parquet_contents_types,
    )
  3. Get ModuleNotFoundError Error message: ModuleNotFoundError: No module named 'fcntl'

Expected behavior The library should either:

Desktop

Additional context The specific part causing the issue is in the Pdf2ParquetTransform class where it attempts to use MultiLock for file synchronization operations. This functionality relies on the Unix-specific fcntl module. I'm currently implementing a workaround using WSL, but it would be beneficial to have native Windows support.

Boris-Chernetsov commented 1 week ago

wrong project, sorry