iterative / datachain

AI-data warehouse to enrich, transform and analyze unstructured data
https://docs.datachain.ai
Apache License 2.0
1.82k stars 80 forks source link

Column attributes missing `.glob` #315

Closed tibor-mach closed 2 months ago

tibor-mach commented 2 months ago

When I try to run

from datachain.lib.dc import DataChain, C

dc = (
    DataChain.from_storage("gs://datachain-demo/neurips")
    .filter(C.file.name.glob("*.pdf"))

I get the following error:

Trying this in Datachain version 0.3.3

It seems that this syntax now simply refers to the name itself, which is why it is a string.

Replacing that with C("file.name") does not work either, resulting in

OperationalError: no such column: file__name
tibor-mach commented 2 months ago

Solved - name is not used any more, replaced by file.path.