viralogic / py-enumerable

A Python module used for interacting with collections of objects using LINQ syntax
MIT License
188 stars 24 forks source link

Using py-linq on files fails in version 1.2.0 #36

Closed bartvanesWB closed 4 years ago

bartvanesWB commented 4 years ago

The following code did work on version 1.1.0:

lst = []
 with open(file_path) as file:
        lst = (Enumerable(file)
                            .skip(1)
                            .where(lambda line: not line.startswith('#'))
                            .aggregate(add_to_list, lst))

Tested with some print-statements and verified that the agregate function add_to_list is never entered.

Downgrading to version 1.1.0 solved the problem (entering the function as expected).

viralogic commented 4 years ago

Thanks for this. I will look into it when I get a chance.

viralogic commented 4 years ago

This has now been fixed and all tests are passing

pip install --upgrade py-linq

Please note that lines in files can now be iterated over and queried using py-linq API. However, the implementation loads all the lines into memory before querying the lines. Probably ok for small files, but large files could consume a lot of memory. In this case, it would be much better to stick to using the streaming capability afforded the io.TextIOBase API.

For Python 2.7, please note that io.open method will have to be used in place of the open built-in method.

The long term solution for this would be to create a py-linq library specifically for files or to use another library that is better suited to reading and querying large files.