Open bryzgaloff opened 4 years ago
Hi!
However, bitshift composition is recommended and is more common.
Very good point! I was thinking about the same method when I needed functional pipelines.
But, python is not really suited for this method in my opinion, so I have decided to go with pipe()
function: https://returns.readthedocs.io/en/latest/pages/pipeline.html
I don't want to disable this check completely and would rather like to ignore bitshift operators only since they are very uncommon in Airflow-based codebases for purposes other than setting up dependencies.
There's not much we can do here. I cannot also just ignore this operator completely in our source code, since
And I cannot think of any simple configuration format for this task.
The first idea is that we can include and exclude nodes in the config section. But, this is not going to work as-is.
Because, you still don't want 1 >> 2
to be valid even if >>
is ignored from this check.
So, I would recommend to either:
noqa
, I hope that this is not a really common task in your code baseAlso huge thanks for you work! It has dramatically simplified teaching my team to use best coding practices without much participation from my side!
Happy to hear that! 😊
I am going to leave this open for some time, maybe you (or others) would have some ideas about what can be done in this case.
so I have decided to go with
pipe()
function
Thanks for an example, but this does not look suitable for Airflow. Airflow is a scheduler and the data is not actually processed within Airflow-based code: it only defines the dependencies while actual code execution is usually run on an external environment (e.g. Airflow schedules a task to be executed by either Spark or Celery worker).
The first idea is that we can include and exclude nodes in the config section
I strongly vote for this. See my arguments below.
Because, you still don't want
1 >> 2
to be valid even if>>
is ignored from this check.
Yup, 1 >> 2
will still violate the rule but the trick is that bitshift operation are truly very uncommon for the domain where Airflow is used (data engineering). Usually Airflow DAG file defines dependencies only by instantiating some DAG operators (instance of an operator is called task in Airflow terms) and bitshifting them to each other. No actual code and data operating is usually present within DAG files. So as for my case, I would be definitely satisfied with having an ability to completely disable check for concrete operator (bitshift in that case).
we can include and exclude nodes in the config section
Could you please give me a little hint on which parts of WPS can I patch to achieve the desired functionality?
Use methods, methods are great!
As I have said, bitshift is really a recommended and them most common approach.
Use
noqa
, I hope that this is not a really common task in your code base.
Not really :) Using bitshifts is exactly very common :) Here is an example from our real project:
unload_task = _unload_to_file(export_config)
if not export_config.is_incremental:
_latest_only(export_config.athena_table_name) >> unload_task
unload_task \
>> _upload_to_s3(export_config.athena_table_name) \
>> [
_clear_export_dir_and_logs(export_config.athena_table_name),
*_repair_tables(export_config),
]
So generally I mean using this configuration is safe enough for a specific area of software engineering (data engineering). I may try to contribute if you help me with navigating through WPS code.
Could you please give me a little hint on which parts of WPS can I patch to achieve the desired functionality?
Here you go: https://github.com/wemake-services/wemake-python-styleguide/search?q=StatementHasNoEffectViolation
@sobolevn will you have a chance to review? https://github.com/wemake-services/wemake-python-styleguide/pull/1760
Rule change request
Make WPS428 (Found statement that has no effect) configurable.
Thesis
Let WPS428 treat some operators as having side effects and not violating the original rule.
Reasoning
WPS introduces rule
WPS428: Found statement that has no effect
. The majority of Airflow-based codebases violate this rule because of using>>
operator to define dependencies between tasks.Please see the last line of the following example from the official documentation:
t1 >> [t2, t3]
. It defines that both taskst2
andt3
are run only whent1
is completed. And this line violates WPS428 since its result is not assigned to anything.There is also a method-style way to define dependencies:
t2.set_upstream(t1)
,t3.set_upstream(t1)
. However, bitshift composition is recommended and is more common.I don't want to disable this check completely and would rather like to ignore bitshift operators only since they are very uncommon in Airflow-based codebases for purposes other than setting up dependencies.
--
Also huge thanks for you work! It has dramatically simplified teaching my team to use best coding practices without much participation from my side!