khuyentran1401 / data-science-template

Template for a data science project
692 stars 198 forks source link

Angreal as an alternative to cookiecutter + Makefile #8

Closed dylanbstorey closed 9 months ago

dylanbstorey commented 11 months ago

First, awesome template. I've been doing some research on what's changed on templating data science projects over the last few years and came across yours.

Wasn't sure what the best route to reach out was so thought I'd just present it to you in an issue, I built a templating engine very similar to cookie cutter but includes the ability to include python functions as plugins for a command line interface. Thought you mind find it interesting / useful if you find maintaining / extending the Makefile annoying.

https://angreal.github.io/angreal/

Again - awesome work !

dagshub[bot] commented 11 months ago

Join the discussion on DagsHub!

tapyu commented 10 months ago

Hi! Nice Python pkg. Agreed that Makefile is annoying, but what do your pkg does that is either better than (or cannot be done with) cookiecutter + Makefile?

dylanbstorey commented 10 months ago

It uses python files as a plugin system for a command line interface - with a look/feel very similar to click.

tapyu commented 9 months ago

It uses python files as a plugin system for a command line interface

I am creating another branch where I am setting a template to work with AWS SageMaker. I confess that CookieCutter has been doing a great job... However, I do need to write messages on terminal to the user as he/she sets the template up. I am not sure if CookieCutter does this job, though. Do your pkg can provide this level interaction with the final user during the creation of the template?

Makefile is quite defective in many aspects, but it is a standard, and CookieCutter is powerful yet simple. I would vote to use angreal/ only if it can handle things that the current tools cannot. Otherwise, I would stick with CookieCutter+Makefile

khuyentran1401 commented 9 months ago

@tapyu Could you specify what you mean by "writing messages on the terminal to the user as he/she sets the template up"?

tapyu commented 9 months ago

@khuyentran1401 AWS SageMaker can be configured to work on your local machine or on the cloud. Depending on this choice, the directory structure may vary. So, I was thinking of prompting a dialog like:

Do you want to work on your local machine or on the cloud?

with a yes/no answer. This would be used to define whether some directories should be created. AFAIK, cookiecutter can only prompt something like this:

image

That is not enough for what I am thinking of...

dylanbstorey commented 9 months ago

The dialog renders based on the variables you want. You can also use the template specific init script for additional interaction.

I'd recommend going through https://angreal.github.io/angreal/tutorials/your-first-angreal/

As a starting point.

https://github.com/angreal/airflow-provider/blob/main/%7B%7B%20provider_name%20%7D%7D/.angreal/init.py

As a concrete example of an init script that gets run at render time. You could make a simple prompting interface within this file.

Alternatively if you want to open an issue on the angreal repo I can support you without polluting this projects issues page.

tapyu commented 9 months ago

The dialog renders based on the variables you want.

The dialog should not be based on variables. Rather, the dialog prompt should show up before the variables are set, to help the end user. A fictional example would be

  [1/6] What is your project name (Project Name): 
  [2/6] What is your directory name (project_name): 
  [3/6] What is the author name (Your Name): 
  [4/6] Which Python package manager you want to use?
    1 - pip
    2 - poetry
    Choose from [1/2] (1): 2

Both of your links have nothing to do with I was looking for. I've installed angreal and used this example. The prompt dialog seemed as poor as cookiecutter:

❯ angreal init https://github.com/angreal/airflow-provider.git
name? ["This Provider"]: 
provider_name? ["airflow-provider-this-provider"]: 
provider_slug? ["airflow_provider_this_provider"]: 
airflow_version? ["2.5.0"]: 
author_name? ["Osborne Reynolds"]: 
author_email? ["osborne.reynolds@airflow.com"]: 

In some scenarios, the variable name, such as airflow_version, is enough to help the user understand what they should put. However, in the case of this repo, it isn't. We do need a more verbose dialog prompt to help user input the variables. It seems that neither cookiecutter nor angreal fulfill this task :(

tapyu commented 9 months ago

You can also use the template specific init script for additional interaction.

init.py runs after the variables are set, not before them...

❯ angreal init meeting_notes
name? ["another_meeting"]: 
cadence? ["weekly"]: 
standing_agenda? ["Complaints"]: 
Hi from init.py another_meeting !
Angreal template (meeting_notes) successfully rendered !
dylanbstorey commented 9 months ago
  1. You can use the default values to ask the question, but I'd suggest defaults and documentation is a better pattern.

If you'd like to change that interface, pull. requests are welcome.

  1. Yes init runs after rendering with the first set of variables. But you can gather additional inputs during the init script and conditionally render during that phase.
khuyentran1401 commented 9 months ago

Hi @dylanbstorey, thanks for the issue. For now, the cookiecutter + Makefile pattern works so I will keep it for now. Thanks for the suggestions.