marsupialtail / quokka

Making data lake work for time series
https://marsupialtail.github.io/quokka/
Apache License 2.0
1.1k stars 60 forks source link

Absence of CI #16

Open melezhik opened 1 year ago

melezhik commented 1 year ago

Hi! I did not find any CI setup. It’d be good to have some unit tests and installation for developer instruction.

marsupialtail commented 1 year ago

@melezhik Sorry for the late response. This is a great point. I will be adding some CI later this year.

Contributions welcome :1st_place_medal:

melezhik commented 1 year ago

you can use CI service I am building - https://ci.sparrowhub.io/report/1844 , right a build fails

marsupialtail commented 1 year ago

That's a bit odd. Seems like a problem with Polars here: https://github.com/pola-rs/polars I don't know why a "build" would fail, Quokka is pure Python and can be installed with pip. Does your CI service try to build all the dependencies instead of pip installing them?

melezhik commented 1 year ago

@marsupialtail you can see pipeline logic here - https://github.com/melezhik/quokka/blob/master/sparrow.yaml

marsupialtail commented 1 year ago

Yeah as I was saying I don't think it's a Quokka problem: 20:09:39 :: Installed /usr/lib/python3.10/site-packages/sqlglot-10.0.7-py3.10.egg 20:09:39 :: Searching for polars==0.14.* 20:09:39 :: Reading https://pypi.org/simple/polars/ 20:09:40 :: Downloading https://files.pythonhosted.org/packages/86/f7/7c305976fe2a99c6d8f7a6295866c2227a04d22bcc9baebf3103629c2648/polars-0.14.29.tar.gz#sha256=5e1cde8d6f12b43619a7ac7b04bef03a00cfcd8d116fc2827572298a8fb754c4 20:09:41 :: Best match: polars 0.14.29 20:09:41 :: Processing polars-0.14.29.tar.gz [task stderr] 20:09:41 :: /usr/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. 20:09:41 :: warnings.warn( 20:09:41 :: /usr/lib/python3.10/site-packages/setuptools/command/easy_install.py:156: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools. 20:09:41 :: warnings.warn( 20:09:41 :: zip_safe flag not set; analyzing archive contents... 20:09:41 :: pyquokka.__pycache__.utils.cpython-310: module references __file__ 20:09:41 :: /usr/lib/python3.10/site-packages/pkg_resources/__init__.py:116: PkgResourcesDeprecationWarning: is an invalid version and will not be supported in a future release 20:09:41 :: warnings.warn( 20:09:41 :: error: Couldn't find a setup script in /tmp/easy_install-riatnk74/polars-0.14.29.tar.gz 20:09:41 :: task exit status: 1 20:09:41 :: task tasks/install FAILED The spawned command 'docker exec -i sparrow-worker sh -l /var/.sparrowdo/env/install/.sparrowdo/sparrowrun.sh' exited unsuccessfully (exit code: 1, signal: 0) in block <unit> at /home/sph/.raku/resources/57C38AFDF922EB0C43584FF5F701A03850B5346F line 13 in sub MAIN at /home/sph/.raku/bin/sparrowdo line 3 in block <unit> at /home/sph/.raku/bin/sparrowdo line 1 I wonder if you can reproduce this just trying to pip install polars on your CI environment?

melezhik commented 1 year ago

Looks like the problem with polars - https://ci.sparrowhub.io/report/1950 , however I don't this error when run pip3 install -e .

melezhik commented 1 year ago

finally after switching to Debian installation went well, however example itself still fails - https://ci.sparrowhub.io/report/1989

marsupialtail commented 1 year ago

The report seems to suggest you are missing the input file it is trying to read from?

melezhik commented 1 year ago

Yes. But I think I just need to create it in my pipeline… if would you give me an example of this file , this would be cool

marsupialtail commented 1 year ago

This describes the data you would need. I'm a bit confused because nowhere in the code it explicitly uses lineitem.tbl.named. I think you are just running the snippet in the README. Can you try some example on this link: https://marsupialtail.github.io/quokka/simple/. That also describes where you can get the data

marsupialtail commented 1 year ago

By the way your Sparrow CI thing looks dope. Does it work with Github actions? I am trying to fulfill my promise of adding in CI by the end of the year. Two features that would be important to Quokka are:

melezhik commented 1 year ago

By the way your Sparrow CI thing looks dope.

Thank you

Does it work with Github actions?

No. There is no need for that. User integrate using SparrowCI directly: sing in with your GitHub login and your repo with sparrow.yaml in the root, that is it

This can be done by running some local script that sets up an AWS cluster, though how you manage the AWS credentials is going to be hard. You don't want that public.

I get you. SparrowCI allows users to upload their secrets and use them in pipeline - https://github.com/melezhik/SparrowCI#secrets-management

melezhik commented 1 year ago

This describes the data you would need. I'm a bit confused because nowhere in the code it explicitly uses lineitem.tbl.named. I think you are just running the snippet in the README.

Can you try some example on this link: https://marsupialtail.github.io/quokka/simple/. That also describes where you can get the data

I will do and let you know how it goes ...

marsupialtail commented 1 year ago

Well the README doesn't look too reassuring lol: WARNING! This feature is still being tested, although security is address seriously (*) in SparrowCI service, don't use SparrowCI secrets to store your credit card information and other valuable data. You've been warned )))