iterative / datachain

AI-dataframe to enrich, transform and analyze data from cloud storages for ML training and LLM apps
https://docs.datachain.ai
Apache License 2.0
712 stars 39 forks source link

added embeddings/gen example #362

Open tibor-mach opened 2 weeks ago

cloudflare-pages[bot] commented 2 weeks ago

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 1739ced
Status: ✅  Deploy successful!
Preview URL: https://cfd82bf7.datachain-documentation.pages.dev
Branch Preview URL: https://consolidation.datachain-documentation.pages.dev

View logs

tibor-mach commented 2 weeks ago

Related to #353

codecov[bot] commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 87.31%. Comparing base (424b05b) to head (1739ced).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #362 +/- ## ======================================= Coverage 87.31% 87.31% ======================================= Files 92 92 Lines 9982 9982 Branches 2041 2041 ======================================= Hits 8716 8716 Misses 911 911 Partials 355 355 ``` | [Flag](https://app.codecov.io/gh/iterative/datachain/pull/362/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iterative) | Coverage Δ | | |---|---|---| | [datachain](https://app.codecov.io/gh/iterative/datachain/pull/362/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iterative) | `87.26% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=iterative#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mattseddon commented 2 weeks ago

Looks like the example won't run on Windows due to ImportError: DLL load failed while importing onnx_cpp2py_export: A dynamic link library (DLL) initialization routine failed.. You'll need to poke around in unstructured to see what version of onnx they are using and whether or not it can be downgraded/pinned to fix the issue. This issue is likely the cause.

shcheklein commented 2 weeks ago

@tibor-mach is this one done? any progress on it? what are the blockers?

mattseddon commented 2 weeks ago

@tibor-mach is this one done? any progress on it? what are the blockers?

The new example needs parameterized as it takes > 1hr to run on Windows.

tibor-mach commented 2 weeks ago

@mattseddon How would you go about that

The new example needs parameterized as it takes > 1hr to run on Windows.

I'm not quite sure why this is the case ... I downgraded the onnx version as per your suggestion here. Now I limited the size of the dataset even more so let's see (but it wasn't overly huge before, it should not take nearly as much time if it is to be of any real use on windows).