twitter / scalding

A Scala API for Cascading
http://twitter.com/scalding
Apache License 2.0
3.5k stars 706 forks source link

add quotation to external APIs #1763

Open fwbrasil opened 6 years ago

fwbrasil commented 6 years ago

This PR is the second part of https://github.com/twitter/scalding/pull/1754. It adds the Quoted implicit param to the user-facing APIs. After this change, the last PR will use the projection information to do automatic pushdown.

fwbrasil commented 6 years ago

@ttim @benpence @dieu @johnynek this is ready for review

fwbrasil commented 6 years ago

This is both exciting, but a bit scary. This touches virtually every public API and make everything binary incompatible.

Yeah, it's an intrusive change. I thought it was clear that it'd be necessary when we discussed it, though.

In exchange for that, it is not yet clear what we are getting. I'd like to see maybe a quoted branch where we can use the final product with say Parquet to do some filter pushdown or projection pushdown.

If we don't have that as a killer app, I'm nervous about merging and getting to it later.

What do you think about making a quoted branch to make this PR into and when we have filter/projection pushdown working, we merge into develop?

That's https://github.com/twitter/scalding/pull/1754. These pull requests are just a break down of that initial one. Note that only projection pushdown is in the scope of this change.

fwbrasil commented 6 years ago

I think right now the logic with passing quoted explicitly / implicitly (and Quoted.internal) is a bit fuzzy. Can we do instead something similar to cause in exceptions?

Let's make Quoted nested and basically putting quoted on method declaration means you want to pass information about this call site. And then if you have quoted parameter in the scope & apply quoted macro - put quoted from the context as .cause call site information.

@ttim I can't see the benefit of this proposal. The quotation propagation is just a way to make the initial information available in the final TypedPipe composition. Could you give more details?

I don't think we really need Quoted.internal in this case and overall logic seems more clear to me (quoted parameter - want to save information about call site).

Quoted.internal is just for safety since people might forget to propagate the quotation when making chanegs to Scalding. See the "Quoted propagation" section on https://github.com/twitter/scalding/pull/1754

fwbrasil commented 6 years ago

To me, that could be done by only accepting Quoted where we take a lambda and only where we expect we could in principle do some filter pushdown or projection pushdown. Does that make sense?

Why else do we want the quoted?

@johnynek take a look at the description on https://github.com/twitter/scalding/pull/1754. We'd also like to use Quoted to provide a better description of the job to users, which requires adding Quoted to all user-facing APIs.

fwbrasil commented 6 years ago

@johnynek I'm still working on updating source to develop without this change. I'll test these changes after it

CLAassistant commented 4 years ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Flavio Brasil seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.