Open luckyasser opened 7 years ago
-1 for brining the Scala library into rheem-java. I still don't understand why couldn't we keep Tuple2 (or Tuple) in rheem-core, and remove the other from rheem-basic. They're used interchangeably in several packages: for example according to the distinction you made above Tuple should've been used in ZipWithIdMapping class, but it's not the case. The code is the same in both classes (99%), and it is confusing to have the 2 IMO (specially if we have them in the same class). We can have a "data" package in rheem-core and move it there (instead of keeping it in util). Also, semantically, the intended usage for Tuple can still be correctly seen as a very standard data type, so standard(almost primitive) that we put it in rheem-core :)
I see your point that both classes are virtually identical. Still, let me list some arguments for having this distinction:
Tuple
is not meant to be used by third-party applications. It is a Rheem internal utility. That's also why org.qcri.rheem.java.mapping.ZipWithIdMapping
is using Tuple2
: because the operator ZipWithId
produces data quanta with the user-facing datatype Tuple2
.Tuple2
might in the future receive special semantics (e.g., mapping to some platform-native notion of tuples, such as PairRDD
, edges in graph engines, or SQL records).Tuple2
will break our Rheem applications, while simply keeping the two classes is no big deal. 😜 My proposal would be to keep this issue open and revisit it, once we have to rewrite the datatypes anyway (potentially in a new major release); e.g., due to mappings to platform-native datatypes or due to consolidations with other datatypes, such as Record
s.
These two classes are slightly different. The
Tuple
class is used withinrheem-core
, where it mostly employed to return more than one result value from a function.Tuple2
comes from therheem-basic
module and is thought to provide a standard data type for Rheem. So, the two classes are intentionally different, but also isTuple2
not visible inrheem-core
.Another consideration could be to use Scala's
Tuple2
rather than Rheem'sTuple2
. This comes at the expense of bringing the Scala library intorheem-java
and alsoscala.Tuple2
is immutable. On the other hand, it is nicer to work with in the Scala API.